You are viewing a plain text version of this content. The canonical link for it is here.

Posted to j-dev@xerces.apache.org by Jeffrey Rodriguez <je...@hotmail.com> on 2000/07/12 07:01:23 UTC

Re: XRI requirements

Ed,

I would like to add to the list of requirements the following:


Grammar Caching
---------------

This would allow the parser to be set for server applications
where we have small message that need to be validated by a
parser and we do not want to reset the Grammar internal
representation every time that we get a new message.

We currently create a Grammar internal representation of
the XML Grammar that we are using ( DTD, Schema, or whatever
else will ever be defined a Grammar to describe XML) we parse
the small message and then we throw away the internal Grammar
representation until the next message comes.

With this feature, we could cache the Grammar(s) either
programmatically or as they are created for the first time.

The next time a cached DTD is found we reuse it so we do not
have to pay the penalty of creating a new Grammar.

Of fast server application that parser and try to validate
documents this would be a performance winner.

Of course all this within the confines of our pluggable architecture
so small barebone parsers don't have to validate, cache or deal
with Grammar only with basic well formdness check.

In the Server side the same parser configure to validate with
could deal with validation in an efficient way.

Thanks,
                Jeffrey Rodriguez
                IBM Silicon Valley




>From: "Ed Staub" <es...@mediaone.net>
>Reply-To: general@xml.apache.org
>To: <ge...@xml.apache.org>
>Subject: XRI requirements
>Date: Wed, 12 Jul 2000 00:03:53 -0400
>
>Here's a quick copy-and-paste list of the proposed
>XRI/Spinnaker/Xerces-2/X2/XRay requirements which have been discussed by
>various people over the last few days.  (Ok, I just threw in "X2" and
>"XRay"!;)  I did some light editing for clarity.  I probably missed a few!
>
>If folks would like me to do more with the proposed requirements list,
>please let me know and I'll propose some procedures and (better!)
>formatting.  But I suspect that it may be desirable to have a more
>well-known and experienced person recording the proposed requirements.  
>It's
>clearly a sensitive area where a person widely believed to be impartial
>would be very valuable.
>
>Regardless... what's missing?
>
>-Ed Staub
>
>
>FEATURES and STANDARDS COMPLIANCE
>---------------------------------
>
>Validating XML 1.0 support
>
>Namespace support
>
>SAX2 support
>
>DOM Level 2 support
>
>XML Schema support
>
>XPath support
>
>XInclude support
>
>Write validation of a DOM tree or revalidation
>
>Grammar access for both DTD and Schema.
>
>Document-order indexes or API as a DOM extension.
>
>[optional] isWhite() method as a DOM extensions (pure telling of
>whether or not the text contains non-whitespace), for performance reasons.
>
>Serialization support, as is currently in Assaf's classes.
>
>PERFORMANCE
>-----------
>
>No significant speed penalty for unused features, notably validation.
>
>Best-of-breed performance across all JIT's (not just Hotspot).
>
>Grammar caching, pre-compiled grammar can be cached to validate instance
>documents over and over again without re-reading the DTD and Schema file or
>even compile it.
>
>A configurable parser, in line with the requirements of Modularity and
>pluggable Validator somebody posted before.  Not only should the validator
>be pluggable, but also if there is a need for a parser without validation 
>at
>all, there should be some parser configuration which is really lightweight
>and just scans, checks well-formedness, and generates SAX2 events.  Or even
>fancier, a brand new functional module can be plugged in BETWEEN components
>as long as it implements certain interfaces.  [Eric Ye]
>
>Mid to upper range validation performance across all JVMs.
>
>Read-only, memory conservative, high performance DOM subset (for Xalan et
>al)
>
>Parsing in chunks [Scott Boag]
>
>A reasonable core (fast!) upon which to layer JDOM [Brett M.]
>
>Some sort of weak reference, where nodes could be released if not
>referenced, and then rebuilt if requested.  For performance and memory
>footprint.
>
>Some sort of way to tell if a SAX char buffer is going to be
>overwritten, so data doesn't have to be copied until this occurs.
>
>Small core footprint for standalone, compiled stylesheet capability, for
>use on small devices.  This would need to include the Serializer.  I'm not
>sure if this should really be a separate micro-parser?
>
>Smallest possible size. This means small distribution size (JAR
>file) and small memory footprint.
>
>
>OTHER QUALITIES
>----------------
>
>Clarity of design and code sufficient to encourage wide participation.
>Code should always be readable and maintainable.
>
>Design portable to C++ (or elsewhere)
>
>More documentation on the Apache web site not only for users but for
>DEVELOPERs.
>
>Testcases for both conformance and benchmarking.
>
>
>===============================
>
>NOMINAL NON-REQUIREMENTS:
>
>Someone stated, without dissent (yet!), that these do not need to be
>supported:
>
>	SAX 1 support
>
>
>---------------------------------------------------------------------
>In case of troubles, e-mail:     webmaster@xml.apache.org
>To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
>For additional commands, e-mail: general-help@xml.apache.org
>

________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com

Re: [Xerces2] Requirements list; 'Xerces2'?; cross-posting

Posted by Stefano Mazzocchi <st...@apache.org>.

Arnaud Le Hors wrote:
> 
> As far as defining what comes out of the parser goes we must put the XML
> Information Set [1] on the list of requirements.
> 
> [1] http://www.w3.org/TR/xml-infoset

Oh, totally +1, it was obvious to me.

-- 
Stefano Mazzocchi      One must still have chaos in oneself to be
                          able to give birth to a dancing star.
<st...@apache.org>                             Friedrich Nietzsche
--------------------------------------------------------------------
 Missed us in Orlando? Make it up with ApacheCON Europe in London!
------------------------- http://ApacheCon.Com ---------------------

Re: [Xerces2] Requirements list; 'Xerces2'?; cross-posting

Posted by Arnaud Le Hors <le...@us.ibm.com>.

As far as defining what comes out of the parser goes we must put the XML
Information Set [1] on the list of requirements.

[1] http://www.w3.org/TR/xml-infoset
-- 
Arnaud  Le Hors - IBM Cupertino, XML Technology Group

Re: [Xerces2] Requirements list; 'Xerces2'?; cross-posting

Posted by Arnaud Le Hors <le...@us.ibm.com>.

Ed Staub wrote:
> 
> If no one better-suited has volunteered to run the proposed requirements
> list in the interim, I'll post an updated list tonight.

Great!

> Do we have consensus on "Xerces2" as a name?
> All of the very recent mail has been in favor.
> Should someone actively poll the committers?

I hope nobody will want to waste any more time on looking for another
name when there are so many more important issues to work on. Let's
focus our energy on the requirements!

> I think this discussion should probably move to(ONLY)
> xerces-j-dev@xml.apache.org., not general@xml.apache.org.  I'm going to
> start posting just on xerces-j-dev@xml.apache.org.  OK?

I agree.
-- 
Arnaud  Le Hors - IBM Cupertino, XML Technology Group

Re: [Xerces2] Requirements list; 'Xerces2'?; cross-posting

Posted by Octav Chipara <oc...@cse.unl.edu>.


Hi Ed,

This is great ...

Octav

> If no one better-suited has volunteered to run the proposed requirements
> list in the interim, I'll post an updated list tonight.
> 
> I'd like to get the list into HTML, to allow linking back to the mail
> archive,
> and onto the website (to reduce mail bandwidth).
> 
> I'd also like to get it into CVS (to preserve history).
> 
> Can I get a volunteer committer and a volunteer with website access who will
> post these for me?
> 
> ------------------------------------------------------
> 
> Do we have consensus on "Xerces2" as a name?
> All of the very recent mail has been in favor.
> Should someone actively poll the committers?
> 
> ------------------------------------------------------
> 
> There are many proposed requirements.
> Some are in conflict.
> Some will never be implemented.
> For the moment, it makes sense to just collect all the proposals.
> 
> But it seems sensible to winnow the list soon.
> Otherwise, the effort will become really poorly defined.
> Do people agree?
> If so, how do we do it?
> Here's a process proposal, based on what I've seen at W3C.
> 
> Break up requirements into logical groups.
> Set a period of a week or to discuss each group.
> 
> 	Midway in the period,
> 	take an early straw vote to see where people are at.
> 
> 	At the end of the period,
> 	vote on an action item for each requirement in the group.
> 
> 	In general, on requirements where there's no consensus
> 	at the end of the period, simply move on to the next group.
> 
> After we've gotten through everything once,
> go back to the hard issues and try to resolve them.
> 
> This process requires a facilitator, to lay out the groups, call the votes,
> etc.
> 
> -------------------------------------------------------
> 
> I think this discussion should probably move to(ONLY)
> xerces-j-dev@xml.apache.org., not general@xml.apache.org.  I'm going to
> start posting just on xerces-j-dev@xml.apache.org.  OK?
> 
> 
> -Ed Staub
> 
> 
> ---------------------------------------------------------------------
> In case of troubles, e-mail:     webmaster@xml.apache.org
> To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
> For additional commands, e-mail: general-help@xml.apache.org
>

[Xerces2] Requirements list; 'Xerces2'?; cross-posting

Posted by Ed Staub <es...@mediaone.net>.

If no one better-suited has volunteered to run the proposed requirements
list in the interim, I'll post an updated list tonight.

I'd like to get the list into HTML, to allow linking back to the mail
archive,
and onto the website (to reduce mail bandwidth).

I'd also like to get it into CVS (to preserve history).

Can I get a volunteer committer and a volunteer with website access who will
post these for me?

------------------------------------------------------

Do we have consensus on "Xerces2" as a name?
All of the very recent mail has been in favor.
Should someone actively poll the committers?

------------------------------------------------------

There are many proposed requirements.
Some are in conflict.
Some will never be implemented.
For the moment, it makes sense to just collect all the proposals.

But it seems sensible to winnow the list soon.
Otherwise, the effort will become really poorly defined.
Do people agree?
If so, how do we do it?
Here's a process proposal, based on what I've seen at W3C.

Break up requirements into logical groups.
Set a period of a week or to discuss each group.

	Midway in the period,
	take an early straw vote to see where people are at.

	At the end of the period,
	vote on an action item for each requirement in the group.

	In general, on requirements where there's no consensus
	at the end of the period, simply move on to the next group.

After we've gotten through everything once,
go back to the hard issues and try to resolve them.

This process requires a facilitator, to lay out the groups, call the votes,
etc.

-------------------------------------------------------

I think this discussion should probably move to(ONLY)
xerces-j-dev@xml.apache.org., not general@xml.apache.org.  I'm going to
start posting just on xerces-j-dev@xml.apache.org.  OK?


-Ed Staub

[Xerces2] Requirements list; 'Xerces2'?; cross-posting

Posted by Ed Staub <es...@mediaone.net>.

If no one better-suited has volunteered to run the proposed requirements
list in the interim, I'll post an updated list tonight.

I'd like to get the list into HTML, to allow linking back to the mail
archive,
and onto the website (to reduce mail bandwidth).

I'd also like to get it into CVS (to preserve history).

Can I get a volunteer committer and a volunteer with website access who will
post these for me?

------------------------------------------------------

Do we have consensus on "Xerces2" as a name?
All of the very recent mail has been in favor.
Should someone actively poll the committers?

------------------------------------------------------

There are many proposed requirements.
Some are in conflict.
Some will never be implemented.
For the moment, it makes sense to just collect all the proposals.

But it seems sensible to winnow the list soon.
Otherwise, the effort will become really poorly defined.
Do people agree?
If so, how do we do it?
Here's a process proposal, based on what I've seen at W3C.

Break up requirements into logical groups.
Set a period of a week or to discuss each group.

	Midway in the period,
	take an early straw vote to see where people are at.

	At the end of the period,
	vote on an action item for each requirement in the group.

	In general, on requirements where there's no consensus
	at the end of the period, simply move on to the next group.

After we've gotten through everything once,
go back to the hard issues and try to resolve them.

This process requires a facilitator, to lay out the groups, call the votes,
etc.

-------------------------------------------------------

I think this discussion should probably move to(ONLY)
xerces-j-dev@xml.apache.org., not general@xml.apache.org.  I'm going to
start posting just on xerces-j-dev@xml.apache.org.  OK?


-Ed Staub

RE: XRI requirements

Posted by James Snell <js...@lemoorenet.com>.

I must appologize if this suggestion has already been made (I have to admit
that I haven't been following the discussion as closely as I should be), but
I think it would be cool for the parser to have support for loading and
validating only certain requested portions of a document -- a useful feature
when dealing with large xml databases, for example.

- James

-----Original Message-----
From: Jeffrey Rodriguez [mailto:jeffreyr_97@hotmail.com]
Sent: Tuesday, July 11, 2000 10:01 PM
To: xerces-j-dev@xml.apache.org; general@xml.apache.org
Subject: Re: XRI requirements


Ed,

I would like to add to the list of requirements the following:


Grammar Caching
---------------

This would allow the parser to be set for server applications
where we have small message that need to be validated by a
parser and we do not want to reset the Grammar internal
representation every time that we get a new message.

We currently create a Grammar internal representation of
the XML Grammar that we are using ( DTD, Schema, or whatever
else will ever be defined a Grammar to describe XML) we parse
the small message and then we throw away the internal Grammar
representation until the next message comes.

With this feature, we could cache the Grammar(s) either
programmatically or as they are created for the first time.

The next time a cached DTD is found we reuse it so we do not
have to pay the penalty of creating a new Grammar.

Of fast server application that parser and try to validate
documents this would be a performance winner.

Of course all this within the confines of our pluggable architecture
so small barebone parsers don't have to validate, cache or deal
with Grammar only with basic well formdness check.

In the Server side the same parser configure to validate with
could deal with validation in an efficient way.

Thanks,
                Jeffrey Rodriguez
                IBM Silicon Valley




>From: "Ed Staub" <es...@mediaone.net>
>Reply-To: general@xml.apache.org
>To: <ge...@xml.apache.org>
>Subject: XRI requirements
>Date: Wed, 12 Jul 2000 00:03:53 -0400
>
>Here's a quick copy-and-paste list of the proposed
>XRI/Spinnaker/Xerces-2/X2/XRay requirements which have been discussed by
>various people over the last few days.  (Ok, I just threw in "X2" and
>"XRay"!;)  I did some light editing for clarity.  I probably missed a few!
>
>If folks would like me to do more with the proposed requirements list,
>please let me know and I'll propose some procedures and (better!)
>formatting.  But I suspect that it may be desirable to have a more
>well-known and experienced person recording the proposed requirements.
>It's
>clearly a sensitive area where a person widely believed to be impartial
>would be very valuable.
>
>Regardless... what's missing?
>
>-Ed Staub
>
>
>FEATURES and STANDARDS COMPLIANCE
>---------------------------------
>
>Validating XML 1.0 support
>
>Namespace support
>
>SAX2 support
>
>DOM Level 2 support
>
>XML Schema support
>
>XPath support
>
>XInclude support
>
>Write validation of a DOM tree or revalidation
>
>Grammar access for both DTD and Schema.
>
>Document-order indexes or API as a DOM extension.
>
>[optional] isWhite() method as a DOM extensions (pure telling of
>whether or not the text contains non-whitespace), for performance reasons.
>
>Serialization support, as is currently in Assaf's classes.
>
>PERFORMANCE
>-----------
>
>No significant speed penalty for unused features, notably validation.
>
>Best-of-breed performance across all JIT's (not just Hotspot).
>
>Grammar caching, pre-compiled grammar can be cached to validate instance
>documents over and over again without re-reading the DTD and Schema file or
>even compile it.
>
>A configurable parser, in line with the requirements of Modularity and
>pluggable Validator somebody posted before.  Not only should the validator
>be pluggable, but also if there is a need for a parser without validation
>at
>all, there should be some parser configuration which is really lightweight
>and just scans, checks well-formedness, and generates SAX2 events.  Or even
>fancier, a brand new functional module can be plugged in BETWEEN components
>as long as it implements certain interfaces.  [Eric Ye]
>
>Mid to upper range validation performance across all JVMs.
>
>Read-only, memory conservative, high performance DOM subset (for Xalan et
>al)
>
>Parsing in chunks [Scott Boag]
>
>A reasonable core (fast!) upon which to layer JDOM [Brett M.]
>
>Some sort of weak reference, where nodes could be released if not
>referenced, and then rebuilt if requested.  For performance and memory
>footprint.
>
>Some sort of way to tell if a SAX char buffer is going to be
>overwritten, so data doesn't have to be copied until this occurs.
>
>Small core footprint for standalone, compiled stylesheet capability, for
>use on small devices.  This would need to include the Serializer.  I'm not
>sure if this should really be a separate micro-parser?
>
>Smallest possible size. This means small distribution size (JAR
>file) and small memory footprint.
>
>
>OTHER QUALITIES
>----------------
>
>Clarity of design and code sufficient to encourage wide participation.
>Code should always be readable and maintainable.
>
>Design portable to C++ (or elsewhere)
>
>More documentation on the Apache web site not only for users but for
>DEVELOPERs.
>
>Testcases for both conformance and benchmarking.
>
>
>===============================
>
>NOMINAL NON-REQUIREMENTS:
>
>Someone stated, without dissent (yet!), that these do not need to be
>supported:
>
>	SAX 1 support
>
>
>---------------------------------------------------------------------
>In case of troubles, e-mail:     webmaster@xml.apache.org
>To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
>For additional commands, e-mail: general-help@xml.apache.org
>

________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com


---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org