You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-dev@xerces.apache.org by Andy Clark <an...@apache.org> on 2000/07/19 01:47:57 UTC

Re: Xerces Redesign: REQUIREMENTS

Requirements for any new design should center on customer 
requirements. Luckily, we're all customers of Xerces so we
already have a good idea regarding what is important. But I
also don't want to neglect other people that are building
commercial products as well as the server-oriented folks
where performance is paramount.

First, I would list the following basic requirements:

  Standards Compliance 
    XML 1.0 
    Namespaces 1.0 
    DOM Level 1, Level 2 
    SAX 1.0, 2.0 
    XML Schema 
  Performance 
  Simplicity 
  Extensibility 
  Maintainability 

I think we can pretty much agree on all of them with perhaps
SAX 1.0 being an exception. I would like to see Xerces 
support it while others feel everyone should move up to
using SAX 2.0. This is an issue we can discuss, though.

Also, the design should accomodate the following features 
and allow new features to be added with ease:

  Core features and properties 
  Error handling 
  Grammar access 
  Grammar caching 

These set of requirements were taken into consideration when
making the skeleton that is checked in under the Xerces 2
branch. See my previous posting for the overall design.

Has anyone had a change to look over it, yet? It might be
easier to read if you detach the HTML and the CSS so that
you can see the highlighting.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Xerces Redesign: REQUIREMENTS

Posted by Andy Clark <an...@apache.org>.
James Duncan Davidson wrote:
> There's no need to generate code at runtime to do this. You can do 
> all sorts of things at runtime with properties and classnames and 
> class.forNames to make things happen.

Using dynamic features too much would make it harder to port to
C++. There are obvious differences between the languages that
will influence the code but I'd like to keep them as close as
possible.

> Generating custom classes is problematic. Then you have versioning 
> problems, issues if you have two parsers in your classpath, and 
> generally feels like something that's statically compiled and not 
> designed to run in a dynamically loaded dynamically linked system.

Yes, custom classes are a problem. Anybody have any ideas?

> I guess you haven't read my full comments. :) I didn't say that 
> we had to provide everything as seperate zip/tgz files.. I said 
> that the parser should be able to be built into peices -- much 
> different. What we ship as an official distro is orthagonal to 
> the targets that we have in our build file and the internal 
> structure of the set of jar files in that distro.

Got it and I agree. Sounds like you are the build person. ;)

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Xerces Redesign: REQUIREMENTS

Posted by James Duncan Davidson <du...@x180.com>.
on 7/21/00 1:57 PM, Andy Clark at andyc@apache.org wrote:

> James Duncan Davidson wrote:
>> I don't see why they have too -- if you think of a build of modules
>> as a set of build targets, you should be able to build what you want
>> out of the source tree. Just want to build the main parser + sax --
>> "./build sax"
> 
> Okay, so how do we do this in the build file?

Using syntax not tied to make or ant :)

    main depends on parser, sax, rdom, ddom, rwdom
    dist depends on main

That's what dependancies are for.

> Depending on the target, do we generate a custom parser instance class that
> gets compiled? This is a possibility. And I actually prefer it because I'd
> like to have the output always be "DOMParser" even if the user doesn't want
> validation but does want a DOM parser.

There's no need to generate code at runtime to do this. You can do all sorts
of things at runtime with properties and classnames and class.forNames to
make things happen.

Generating custom classes is problematic. Then you have versioning problems,
issues if you have two parsers in your classpath, and generally feels like
something that's statically compiled and not designed to run in a
dynamically loaded dynamically linked system.

> I guess you haven't fielded the hordes of complaints from people
> that have had to download an ever-increasing ZIP/TGZ files. The
> build scripts are separate. People still want to download less
> in order to get what they want.

I guess you haven't read my full comments. :) I didn't say that we had to
provide everything as seperate zip/tgz files.. I said that the parser should
be able to be built into peices -- much different. What we ship as an
official distro is orthagonal to the targets that we have in our build file
and the internal structure of the set of jar files in that distro.

.duncan


Re: Xerces Redesign: REQUIREMENTS

Posted by Andy Clark <an...@apache.org>.
Ed Staub wrote:
> I had thought that this was settled earlier, to the effect that we'd put out
> two deployables:
>         - a "kitchen sink" jar as at present
>         - a set of "module" jars which together contain the same files as the
> "kitchen sink".
> 
> Is this ok with everyone?

As long as they are in separate downloadable ZIP/TGZs. Which is
why I was heading towards just downloading the separate modules.
Perhaps somewhere inbetween is a happy medium.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

RE: Xerces Redesign: REQUIREMENTS

Posted by Ed Staub <es...@mediaone.net>.
Andy Clark wrote:

> James Duncan Davidson wrote:
> Here I disagree with you. Coordinating the code across all these
>> spaces would be more of a pain than it's worth. The code should
>> be in seperate *packages* -- and build into seperate jars imho.
>> But it should be possible to "./build all" and get the who shebang.

>I guess you haven't fielded the hordes of complaints from people
>that have had to download an ever-increasing ZIP/TGZ files. The
>build scripts are separate. People still want to download less
>in order to get what they want.

It's clear that there are two communities here with differing requirements.
My own usage tends toward Andy's; I have to edit classpaths in script files
too often to want a lot of extra files.

I had thought that this was settled earlier, to the effect that we'd put out
two deployables:
	- a "kitchen sink" jar as at present
	- a set of "module" jars which together contain the same files as the
"kitchen sink".

Is this ok with everyone?

--------------

<tangent>By the way, the mail archives will hopefully be brought up to date
today, according to Dirk-Willem van Gulik.  There was a hardware
problem.</tangent>

-Ed Staub


-----Original Message-----
From: Andy Clark [mailto:andyc@apache.org]
Sent: Friday, July 21, 2000 4:58 PM
To: xerces-j-dev@xml.apache.org
Subject: Re: Xerces Redesign: REQUIREMENTS


James Duncan Davidson wrote:
> I don't see why they have too -- if you think of a build of modules
> as a set of build targets, you should be able to build what you want
> out of the source tree. Just want to build the main parser + sax --
> "./build sax"

Okay, so how do we do this in the build file? Depending on the
target, do we generate a custom parser instance class that gets
compiled? This is a possibility. And I actually prefer it
because I'd like to have the output always be "DOMParser" even
if the user doesn't want validation but does want a DOM parser.

The design I posted separates all of the basic functionality
into a series of base classes. Then DOMParser and SAXParser
become very simple wrappers on top of the basic document
parsing class.

> Here I disagree with you. Coordinating the code across all these
> spaces would be more of a pain than it's worth. The code should
> be in seperate *packages* -- and build into seperate jars imho.
> But it should be possible to "./build all" and get the who shebang.

I guess you haven't fielded the hordes of complaints from people
that have had to download an ever-increasing ZIP/TGZ files. The
build scripts are separate. People still want to download less
in order to get what they want.

--
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
For additional commands, e-mail: xerces-j-dev-help@xml.apache.org


Re: Xerces Redesign: REQUIREMENTS

Posted by Andy Clark <an...@apache.org>.
James Duncan Davidson wrote:
> I don't see why they have too -- if you think of a build of modules 
> as a set of build targets, you should be able to build what you want 
> out of the source tree. Just want to build the main parser + sax -- 
> "./build sax"

Okay, so how do we do this in the build file? Depending on the
target, do we generate a custom parser instance class that gets
compiled? This is a possibility. And I actually prefer it
because I'd like to have the output always be "DOMParser" even
if the user doesn't want validation but does want a DOM parser.

The design I posted separates all of the basic functionality
into a series of base classes. Then DOMParser and SAXParser
become very simple wrappers on top of the basic document
parsing class.

> Here I disagree with you. Coordinating the code across all these 
> spaces would be more of a pain than it's worth. The code should 
> be in seperate *packages* -- and build into seperate jars imho. 
> But it should be possible to "./build all" and get the who shebang.

I guess you haven't fielded the hordes of complaints from people
that have had to download an ever-increasing ZIP/TGZ files. The
build scripts are separate. People still want to download less
in order to get what they want.

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

RE: Xerces Redesign: REQUIREMENTS

Posted by Paulo Gaspar <pa...@krankikom.de>.
I think Duncan's views on packaging to be so obviously right, simple and
flexible, that I am having trouble understanding why someone would oppose
to them.

The only complexity they have is the house keeping of the different builds.
Is that much?

It has no adverse impact on architecture - just enforces a good one:
 - Modules should be - obviously - MODULAR (as opposed to promiscuous);
 - Module interdependencies must be WELL DEFINED and simplified by design.

If the above conditions are respected, defining several build
configurations should be (almost) something like a trivial administrative
task.


Have fun,
Paulo Gaspar


> -----Original Message-----
> From: James Duncan Davidson [mailto:james.davidson@eng.sun.com]
> Sent: Friday, July 21, 2000 09:31
>
> on 7/18/00 7:46 PM, Andy Clark at andyc@apache.org wrote:
>
> > I don't see why this needs to be folded into the Xerces build.
>
> I don't see why they have too -- if you think of a build of
> modules as a set
> of build targets, you should be able to build what you want out of the
> source tree. Just want to build the main parser + sax -- "./build sax"
>
> In fact, I could see a Xerces-Light dist that just had a few
> things in it, a
> Xerces-Full dist that had *everything* in it -- and whatever other dists
> people thought valuable. But mostly in the range between light
> and full, I'd
> like app programmers chose their poison.
>
> > In fact, I'd like to move away from "everything *and* the
> > kitchen sink" approach where all donations get rolled into the
> > main code release.
>
> Here I disagree with you. Coordinating the code across all these spaces
> would be more of a pain than it's worth. The code should be in seperate
> *packages* -- and build into seperate jars imho. But it should be possible
> to "./build all" and get the who shebang.
>
> .duncan


Re: Xerces Redesign: REQUIREMENTS

Posted by James Duncan Davidson <ja...@eng.sun.com>.
on 7/18/00 7:46 PM, Andy Clark at andyc@apache.org wrote:

> I don't see why this needs to be folded into the Xerces build.

I don't see why they have too -- if you think of a build of modules as a set
of build targets, you should be able to build what you want out of the
source tree. Just want to build the main parser + sax -- "./build sax"

In fact, I could see a Xerces-Light dist that just had a few things in it, a
Xerces-Full dist that had *everything* in it -- and whatever other dists
people thought valuable. But mostly in the range between light and full, I'd
like app programmers chose their poison.

> In fact, I'd like to move away from "everything *and* the
> kitchen sink" approach where all donations get rolled into the
> main code release.

Here I disagree with you. Coordinating the code across all these spaces
would be more of a pain than it's worth. The code should be in seperate
*packages* -- and build into seperate jars imho. But it should be possible
to "./build all" and get the who shebang.

.duncan


Re: Xerces Redesign: REQUIREMENTS

Posted by Andy Clark <an...@apache.org>.
Brett McLaughlin wrote:
> There's also been agreement (or at least lack of disagreement) that,
> given the ability to do it in a modular fashion, JDOM will be supported.

I don't see why this needs to be folded into the Xerces build.
In fact, I'd like to move away from "everything *and* the
kitchen sink" approach where all donations get rolled into the
main code release. I want to see separate modules. Whether 
these separate modules are hosted on the XML Apache site is
another question.

> I'm curious as to if you intended this list to be ordered (in priority)?
> There are big issues as to whether performance or simplicity is more
> important.

No priority should be given to my list.

> Would it be acceptable to support SAX 1.0 purely through the
> ParserAdapter class that SAX 2.0 comes with? That would allow us to
> support it, albeit at a little slower and less "native" route.

Perhaps. These are all issues that we need to work through
and vote on.

> that his list was the "definitive" list, as it is a lot larger, includes
> things like XLink, XPointer, XPath, etc. that far supercede Andy's.

There is a lot of overlap -- I just wanted to post the
requirements we were using to start our re-design discussion.

> Without looking too deeply, Andy, have you started using ints, interned
> Strings, or a RecyclableString type of construct? This is one of the

Strings all the way.

In the design that I posted, there is a SymbolTable class
that manages symbols to perform the "intern" of various
strings found in documents. The actual hashing is performed
by a SymbolHasher interface and allows 1) the hashing 
function to be modified, and 2) doesn't require you to
create a string in order to call intern().

-- 
Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org

Re: Xerces Redesign: REQUIREMENTS

Posted by James Duncan Davidson <ja...@eng.sun.com>.
on 7/18/00 5:07 PM, Brett McLaughlin at brett.mclaughlin@lutris.com wrote:

> It seems that the requirements that Andy keeps mentioning need to be
> merged with the online version that Ed is keeping. My understanding is
> that his list was the "definitive" list, as it is a lot larger, includes
> things like XLink, XPointer, XPath, etc. that far supercede Andy's.
> Where they are disjoint, we need to discuss and rectify. At that point,
> I'd hope we can use that one list, as keeping two lists is really
> confusing ;-)

+1

We need not create more work than necessary.. I kicked off with a short list
-- Ted and Ed have been keeping score since then and I think that's a good
thing imho since there's so much paranioa about com vs com here. :)


.duncan


Re: Xerces Redesign: REQUIREMENTS

Posted by James Duncan Davidson <ja...@eng.sun.com>.
on 7/18/00 7:15 PM, Ed Staub at estaub@mediaone.net wrote:
 
> I plan to do the next revision this weekend, for posting on Monday when
> Ted gets back and can check it in and post to the website.

Righto. :)

.duncan


RE: Xerces Redesign: REQUIREMENTS

Posted by Ed Staub <es...@mediaone.net>.
Brett McLaughlin wrote:
>
> It seems that the requirements that Andy keeps mentioning need to be
> merged with the online version that Ed is keeping. My understanding is
> that his list was the "definitive" list, as it is a lot larger, includes
> things like XLink, XPointer, XPath, etc. that far supercede Andy's.

I (and Ted L.) plan to integrate new mail into the requirements list
on a regular basis.  I expect to include a change summary at each revision.

I plan to do the next revision this weekend, for posting on Monday when
Ted gets back and can check it in and post to the website.

-Ed Staub


Re: Xerces Redesign: REQUIREMENTS

Posted by Brett McLaughlin <br...@lutris.com>.

Andy Clark wrote:
> 
> Requirements for any new design should center on customer
> requirements. Luckily, we're all customers of Xerces so we
> already have a good idea regarding what is important. But I
> also don't want to neglect other people that are building
> commercial products as well as the server-oriented folks
> where performance is paramount.
> 
> First, I would list the following basic requirements:
> 
>   Standards Compliance
>     XML 1.0
>     Namespaces 1.0
>     DOM Level 1, Level 2
>     SAX 1.0, 2.0

There's also been agreement (or at least lack of disagreement) that,
given the ability to do it in a modular fashion, JDOM will be supported.

>     XML Schema
>   Performance
>   Simplicity
>   Extensibility
>   Maintainability

I'm curious as to if you intended this list to be ordered (in priority)?
There are big issues as to whether performance or simplicity is more
important.


> 
> I think we can pretty much agree on all of them with perhaps
> SAX 1.0 being an exception. I would like to see Xerces

Would it be acceptable to support SAX 1.0 purely through the
ParserAdapter class that SAX 2.0 comes with? That would allow us to
support it, albeit at a little slower and less "native" route.

> support it while others feel everyone should move up to
> using SAX 2.0. This is an issue we can discuss, though.
> 
> Also, the design should accomodate the following features
> and allow new features to be added with ease:
> 
>   Core features and properties
>   Error handling
>   Grammar access
>   Grammar caching
> 
> These set of requirements were taken into consideration when
> making the skeleton that is checked in under the Xerces 2
> branch. See my previous posting for the overall design.
> 
> Has anyone had a change to look over it, yet? It might be
> easier to read if you detach the HTML and the CSS so that
> you can see the highlighting.

It seems that the requirements that Andy keeps mentioning need to be
merged with the online version that Ed is keeping. My understanding is
that his list was the "definitive" list, as it is a lot larger, includes
things like XLink, XPointer, XPath, etc. that far supercede Andy's.
Where they are disjoint, we need to discuss and rectify. At that point,
I'd hope we can use that one list, as keeping two lists is really
confusing ;-)

Without looking too deeply, Andy, have you started using ints, interned
Strings, or a RecyclableString type of construct? This is one of the
big, core issues, and I'm curious as to if you went with ints since
Xerces 1 did, or if you even got that far. The impression I am getting
(and, btw, agree with) is that interned Strings are the way people want
to go - it is the best compromise of performance and clarity. Also,
there was talk of using an interface that can be implemented
differently, although that seemed to be something that could cause
cross-talk (I use X impl, you use Y impl, we get confused).

Thanks for your thoughts...

-Brett

> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * andyc@apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: xerces-j-dev-unsubscribe@xml.apache.org
> For additional commands, e-mail: xerces-j-dev-help@xml.apache.org

-- 
Brett McLaughlin, Enhydra Strategist
Lutris Technologies, Inc. 
1200 Pacific Avenue, Suite 300 
Santa Cruz, CA 95060 USA 
http://www.lutris.com
http://www.enhydra.org