You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Simon Kitching <sk...@apache.org> on 2005/01/31 11:09:28 UTC

[digester] initial code for Digester2.0

Hi,

As I mentioned a few months ago, I've been working on some ideas for
Digester 2.0. I've put some code and notes up on 
  http://www.apache.org/~skitching

Comments from all commons-dev subscribers are welcome, but particularly
from Craig and Robert.

The RELEASE-NOTES.txt file gives a brief overview of what I've done so
far, and what I personally would like to see. 

This is *not* intended to be final code, but rather to solicit yes/no
feedback on what people like/dislike about the posted code. As you will
see, many parts are still missing and I personally would still like to
see significant changes even to parts already included (see
RELEASE-NOTES.txt). However the basic structure is there, including a
number of controversial (I expect) name changes.

Once we get the general opinions out, and I have massaged the code into
something that meets general concensus I hope to then add it to the
sandbox for everyone to hack away at.

Cheers,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

[digester] org.apache.commons.digester2

Posted by Oliver Zeigermann <ol...@gmail.com>.

Big +1 for org.apache.commons.digester2!

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Emmanuel Bourg <eb...@apache.org>.

Simon Kitching wrote:

> BTW, should we contact the car companies, and tell them their customers
> prefer suffixes?
>   "Focus Ford"
>   "Mustang Ford"
>   "Thunderbird Ford"
> 
> (I'm mostly kidding...)

I think the analogy is incomplete, you forgot the objet being qualified 
by the brand. Would you say

"Car Ford Focus"
"Car Ford Mustang"
"Car Ford Thunderbird rd"

or

"Ford Focus Car"
"Ford Mustang Car"
"Ford Thunderbird Car"

?

:)

Emmanuel Bourg

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

On Mon, 2005-01-31 at 21:43 -0700, Wendy Smoak wrote:
> From: "Simon Kitching" <sk...@apache.org>
> 
> > Ok, we'll see what the general consensus is. I happen to personally like
> > prefixes rather than suffixes, but will go with the majority opinion.
> 
> Another vote for suffix - I prefer CallMethodAction to ActionCallMethod.

BTW, should we contact the car companies, and tell them their customers
prefer suffixes?
  "Focus Ford"
  "Mustang Ford"
  "Thunderbird Ford"

(I'm mostly kidding...)

Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by ja...@wendysmoak.com.

Simon Kitching" wrote:

> Does this mean you prefer Action to Rule? I certainly expect to hear
> from people who want to keep the current names...

No preference there, [and I'll get used to prefix/suffix, whichever way it
goes, it's not THAT big of a deal, but you asked...]

-- 
Wendy Smoak

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Reid Pinchback <re...@yahoo.com>.

--- Simon Kitching <sk...@apache.org> wrote:
> Does this mean you prefer Action to Rule? I certainly expect to hear
> from people who want to keep the current names...

I'm not wedded to "Rule" but I do have a concern about "Action".
I suspect it could make Struts code rather confusing.

__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

On Mon, 2005-01-31 at 21:43 -0700, Wendy Smoak wrote:
> From: "Simon Kitching" <sk...@apache.org>
> 
> > Ok, we'll see what the general consensus is. I happen to personally like
> > prefixes rather than suffixes, but will go with the majority opinion.
> 
> Another vote for suffix - I prefer CallMethodAction to ActionCallMethod.

Ok. 

Does this mean you prefer Action to Rule? I certainly expect to hear
from people who want to keep the current names...

> 
> Will ActionFactory have all of the available Action constructor signatures?

Yes. I just don't want to implement them all until the final names have
been decided on...

Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Wendy Smoak <ja...@wendysmoak.com>.

From: "Simon Kitching" <sk...@apache.org>

> Ok, we'll see what the general consensus is. I happen to personally like
> prefixes rather than suffixes, but will go with the majority opinion.

Another vote for suffix - I prefer CallMethodAction to ActionCallMethod.

Will ActionFactory have all of the available Action constructor signatures?

Thanks,
Wendy Smoak


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] performance of ns-aware parsing

Posted by Simon Kitching <sk...@apache.org>.

On Sun, 2005-02-06 at 13:02 -0800, Reid Pinchback wrote:
> --- Simon Kitching <sk...@apache.org> wrote:
> > > I stopped using belief as a measurement of code a long time
> > > ago.  Usually only works when I wrote all the code.  :-)
> > > I'll cook up an experiment and see what I can come up with
> > > in the way of timing information.
> > 
> > That would be excellent. I look forward to seeing the results..
> 
> Actually, an experiment implies a question to be answered, and
> while this has been an interesting back-and-forth, not sure
> we really have a question to answer.  This whole thing began
> with me simply asking a question about something you'd
> put in your readme file on the upcoming work.  Practically
> I don't see you not expecting a namespace-aware parser, the
> question is really more one of the user of Digester2 deciding
> if they are using namespace features.  While we could do
> timing tests to help people understand what the impact may
> or may not be of using NS in the documents they parse, it
> obviously has nothing to do with whether or not you are
> going to expect a parser to handle NS if the docs contain NS.
> That will be the developer's problem, not yours, yes?

Hi Reid,

I don't quite understand the above.

You mean these are the questions?
* should people avoid creating xml documents that use namespaces
  if they care about the performance of later parsing the doc?
* Is there a significant performance benefit in parsing 
  non-namespaced xml with a non-namespace-aware parser?
* Is there a significant performance benefit in parsing
  namespace-using-xml with a non-namespace-aware parser
  (yecch!).

The first is an interesting question, and is partially related to the
third one in that it gives people an *option* (though not a good one
IMHO) to parse the document fast. But mostly I agree this is the
developer's problem, not digester's. Tf we can give a hint somewhere in
our docs about parser performance with/without ns, though, I'm sure
people would appreciate it.

For either of the second, the answer is relevant to digester; if the
answer to either is yes, then I would support allowing a
non-namespace-aware parser to be used with digester. By support, I mean
writing code that allows instantiation of ns-aware or non-ns-aware
parser, code that looks for localname/qname, support in the RuleManager
classes for matching such elements, and unit tests to test it all.

Currently, I'm not hugely motivated to test either of the last two
scenarios, as I *believe* the answer to both is no, but if someone else
does I'll look at the results with interest.

Is this what you meant?

Regards,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] performance of ns-aware parsing

Posted by Reid Pinchback <re...@yahoo.com>.

--- Simon Kitching <sk...@apache.org> wrote:
> > I stopped using belief as a measurement of code a long time
> > ago.  Usually only works when I wrote all the code.  :-)
> > I'll cook up an experiment and see what I can come up with
> > in the way of timing information.
> 
> That would be excellent. I look forward to seeing the results..

Actually, an experiment implies a question to be answered, and
while this has been an interesting back-and-forth, not sure
we really have a question to answer.  This whole thing began
with me simply asking a question about something you'd
put in your readme file on the upcoming work.  Practically
I don't see you not expecting a namespace-aware parser, the
question is really more one of the user of Digester2 deciding
if they are using namespace features.  While we could do
timing tests to help people understand what the impact may
or may not be of using NS in the documents they parse, it
obviously has nothing to do with whether or not you are
going to expect a parser to handle NS if the docs contain NS.
That will be the developer's problem, not yours, yes?

__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - You care about security. So do we. 
http://promotions.yahoo.com/new_mail

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] performance of ns-aware parsing

Posted by Simon Kitching <sk...@apache.org>.

On Sat, 2005-02-05 at 21:02 -0800, Reid Pinchback wrote:
> --- Simon Kitching <sk...@apache.org> wrote:
> > >  Mucking with (d) is supposed to result in significant
> > > wins when you tune the grammar handling to your app, but I haven't tried it 
> > > myself and I've never seen timing differences quoted.  
> > > 
> > 
> > I don't quite understand what (d) means, but is it actually relevant?
> > Again, we are talking about *namespaces* not validation.
> 
> Yes... and every entity (Element and Attribute) is jammed through a
> resolution process first.  Remember XML attributes with default values?
> Guess where those values are identified and handed to the parser - during
> the resolution process.  Namespaces just add more data to shuffle
> around during the resolution process.

Well, in a document that doesn't use namespaces, the penalty is zero.

In a document that uses namespaces, there are a few xmlns:... attributes
floating around. But these have to be handled by the DTD processor
regardless of whether namespace processing is enabled or not, yes?

I don't see where namespaces adds any extra data for a DTD processor to
deal with during the "infoset augmentation" stage.


> 
> > What I'm trying to achieve is to avoid having actions or patterns deal
> > with element-names containing prefixes, eg stating that an element's
> > name is "foo:item". This is just broken; the item's name is really the
> > tuple (some-namespace, item).
> > 
> > Grammars/schemas can optionally be bound to namespaces, but namespaces
> > themselves are a lower layer that can be used without any of these
> > things. I'm talking here about requiring the parser to convert
> > <foo:item> into (namespace, item) but do not intend to imply that any
> > kind of schema should be loaded for the specified namespace. 
> 
> That sounds sensible.
> 
> > The XMLReader.setNamespaceAware(true) method does exactly this; enables
> > mapping of prefixes -> namespaces, but does not enable processing of
> > either DTDs or schemas.
> 
> I don't think it actually has any impact at all on DTD processing.
> DTDs, if declared, are always processed unless you install an entity 
> resolver that excises that activity out.

You are right; DTDs get processed in the same manner regardless of
whether the parser is namespace-aware or not. What I meant was
namespaceAware does not affect the parser's handling of DTDs or schemas
(though it is a prerequisite for schema validation).

> 
> > >  I agree
> > > that old parsers providing (c) aren't particularly interesting, but
> > > if you spend any time tracing through the guts of the parsing, particularly
> > > when you see how DTDs are loaded for entity resolution, you begin to see 
> > > (d) as having potential.  Throwing (b) away may result in less code in
> > > Digester2, but it may be worth doing some timing tests to see if that 
> > > code reduction is consequence-free.
> > 
> > What does loading DTDs have to do with namespaces?
> 
> As you said, the XML spec doesn't require that the namespaces mean
> anything, and hence it is possible that a parser won't try to resolve
> and validate against multiple DTDs, but I haven't ever traced through
> the code in a situation where there were multiple namespaces to
> resolve against, so I don't know if there is relationship there or not.
> In general, if a parser thinks it needs a DTD in order to understand
> a document, it tends to grab it.  

I presume you're using "DTD" as a general term covering both traditional
DTDs (which are not namespace-aware) and w3c schemas?

An xml parser does need to read a DTD regardless of whether validation
is enabled or not, for the reasons you pointed out: default attributes,
entity definitions etc.

But w3c xml schemas deliberately don't have any functionality that
affects the infoset of the document. So if you're not validating you can
completely ignore any xml schema - and parsers do. To double-check, I
tested this today, and verified the entity resolver isn't called to
resolve xsi:schemaLocation references unless validation is enabled.

> I don't know if there are situations
> where it tries to interpret namespace declations as public ids for DTDs.
No, xml parsers never dereference namespace-uris to load either DTDs or
schemas. The only way to reference a schema from an xml document is via
  xsi:schemaLocation="namespace url"

I think some XML editing programs do try to load schemas based upon the
namespace URI (eg jEdit, XMLSpy) but this is quite different (and
probably against the xml standard).


> > > > I still find it hard to believe that leaving out namespace support makes
> > > > a performance difference. The parser needs to keep a map of
> > > >    prefix->(stack of namespace)
> > > > and that's about it. 
> 
> I stopped using belief as a measurement of code a long time
> ago.  Usually only works when I wrote all the code.  :-)
> I'll cook up an experiment and see what I can come up with
> in the way of timing information.

That would be excellent. I look forward to seeing the results..


Regards,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] performance of ns-aware parsing

Posted by Reid Pinchback <re...@yahoo.com>.

--- Simon Kitching <sk...@apache.org> wrote:

> On Thu, 2005-02-03 at 07:52 -0800, Reid Pinchback wrote: 
> > Even for Sax the performance difference between (a) and (b) is roughly 
> > a factor of 2 across all parsers when processing small (typical message-sized) 
> > docs that don't use NS. 
> 
> I would *really* love to see some actual measurements on this if you can
> find some. You seem to be quoting from some study you have done or read
> - it would be great to have this. [See comments on Piccolo below]

Take another look at the Piccolo data, and compare the 2 Soap examples
to the random no-NS data.  The differences between the two Soap examples
isn't material because both use NS, so in a sense you have a couple of
different samples of NS data, and in the random case you have another
sample, but I agree it would be better to create tests that were better
understood in order to decide what the difference was.

> >  Mucking with (d) is supposed to result in significant
> > wins when you tune the grammar handling to your app, but I haven't tried it 
> > myself and I've never seen timing differences quoted.  
> > 
> 
> I don't quite understand what (d) means, but is it actually relevant?
> Again, we are talking about *namespaces* not validation.

Yes... and every entity (Element and Attribute) is jammed through a
resolution process first.  Remember XML attributes with default values?
Guess where those values are identified and handed to the parser - during
the resolution process.  Namespaces just add more data to shuffle
around during the resolution process.

> What I'm trying to achieve is to avoid having actions or patterns deal
> with element-names containing prefixes, eg stating that an element's
> name is "foo:item". This is just broken; the item's name is really the
> tuple (some-namespace, item).
> 
> Grammars/schemas can optionally be bound to namespaces, but namespaces
> themselves are a lower layer that can be used without any of these
> things. I'm talking here about requiring the parser to convert
> <foo:item> into (namespace, item) but do not intend to imply that any
> kind of schema should be loaded for the specified namespace. 

That sounds sensible.

> The XMLReader.setNamespaceAware(true) method does exactly this; enables
> mapping of prefixes -> namespaces, but does not enable processing of
> either DTDs or schemas.

I don't think it actually has any impact at all on DTD processing.
DTDs, if declared, are always processed unless you install an entity 
resolver that excises that activity out.

> >  I agree
> > that old parsers providing (c) aren't particularly interesting, but
> > if you spend any time tracing through the guts of the parsing, particularly
> > when you see how DTDs are loaded for entity resolution, you begin to see 
> > (d) as having potential.  Throwing (b) away may result in less code in
> > Digester2, but it may be worth doing some timing tests to see if that 
> > code reduction is consequence-free.
> 
> What does loading DTDs have to do with namespaces?

As you said, the XML spec doesn't require that the namespaces mean
anything, and hence it is possible that a parser won't try to resolve
and validate against multiple DTDs, but I haven't ever traced through
the code in a situation where there were multiple namespaces to
resolve against, so I don't know if there is relationship there or not.
In general, if a parser thinks it needs a DTD in order to understand
a document, it tends to grab it.  I don't know if there are situations
where it tries to interpret namespace declations as public ids for DTDs.
If that happens, then those DTDs would also be loaded by the parser
and namespaces would have to be matched to the appropriate collections
of contexts during entity resolution.

> > > I still find it hard to believe that leaving out namespace support makes
> > > a performance difference. The parser needs to keep a map of
> > >    prefix->(stack of namespace)
> > > and that's about it. 

I stopped using belief as a measurement of code a long time
ago.  Usually only works when I wrote all the code.  :-)
I'll cook up an experiment and see what I can come up with
in the way of timing information.

> Sorry, what per-entity operations, and what temporary object creations?

The Jade/Javolution author wrote a fair bit about that, I'll see
if I can find his pages.  I couldn't find the details at the
Javolution site; when Jade was separate he indicated that the
String operations required to satisfy the SAX API semantics 
dragged down performance heavily.

> >   Zapthink comments on XML parsing challenges,
> >   http://searchwebservices.techtarget.com/originalContent/0,289142,sid26_gci858888,00.html
> 
> No occurrence of the word "namespace" anywhere in the article.

For this and other similar concepts, it helps to start associating
namespaces with other aspects of parsing internals.  Elements and 
attributes have to be "matched up" to their definitions - the 
resolution process.  Namespaces are an aspect of the match up, just 
more information to shuffle around and perform string compares against.
Take a look at all the elements and attributes in a (e.g. 10K document), 
calculate all the callbacks invoked, and any activity that adds a 
per-callback load has potential for impacting performance.  That is 
why Jade put effort into eliminating String creations, because those 
where proportional to the number of entities parsed.  Folks who try
to speed up parsers seem to follow 1 of 2 approaches:
  1. eliminate per-entity costs (same idea as factoring ops out of loops)
  2. avoid per-entity costs (e.g. pull parsers and deferred DOM parsers)

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] performance of ns-aware parsing

Posted by Simon Kitching <sk...@apache.org>.

On Thu, 2005-02-03 at 07:52 -0800, Reid Pinchback wrote: 
> --- Simon Kitching <sk...@apache.org> wrote:
> 
> > On Wed, 2005-02-02 at 20:45 -0800, Reid Pinchback wrote:
> > Of course if someone can demonstrate that non-namespace-aware parsers
> > *are* still useful then I'll change my mind.
> 
> Just to clarify, since I was being sloppy before (I gotta
> stop typing in shorthand) there is an important distinction:
> 
> a) having NS-aware parser, always using NS-aware API methods
> b) having NS-aware parser, selectively using NS-aware API methods
> c) having non-NS-aware parser (and obviously never using NS-aware API methods)
> d) having NS-aware parser where the developer fixes a grammar that
>    ignores any NS distinctions
> 

> Even for Sax the performance difference between (a) and (b) is roughly 
> a factor of 2 across all parsers when processing small (typical message-sized) 
> docs that don't use NS. 

I would *really* love to see some actual measurements on this if you can
find some. You seem to be quoting from some study you have done or read
- it would be great to have this. [See comments on Piccolo below]

>  Mucking with (d) is supposed to result in significant
> wins when you tune the grammar handling to your app, but I haven't tried it 
> myself and I've never seen timing differences quoted.  
> 

I don't quite understand what (d) means, but is it actually relevant?
Again, we are talking about *namespaces* not validation.

The w3c namespaces spec clearly makes a distinction between namespaces
and whether or not the namespace URI "means" anything:

<quote source="http://www.w3c.org/TR/xml-names11/">
Note also that the Namespaces specification says nothing about what
might (or might not) happen if one were to attempt to dereference a
URI/IRI used to identify a namespace.
</quote>

What I'm trying to achieve is to avoid having actions or patterns deal
with element-names containing prefixes, eg stating that an element's
name is "foo:item". This is just broken; the item's name is really the
tuple (some-namespace, item).

Grammars/schemas can optionally be bound to namespaces, but namespaces
themselves are a lower layer that can be used without any of these
things. I'm talking here about requiring the parser to convert
<foo:item> into (namespace, item) but do not intend to imply that any
kind of schema should be loaded for the specified namespace. 

The XMLReader.setNamespaceAware(true) method does exactly this; enables
mapping of prefixes -> namespaces, but does not enable processing of
either DTDs or schemas.

> I'm not trying to advocate any approach except to notice that, since your 
> README mentioned requiring a namespace-aware parser, it sounded like 
> there was a potential for options (b), (c), and (d) to become unintentionally
> closed to developers in Digester2 when they weren't in Digester1. 

Well, I did intend to close options (b) and (c) as I didn't believe
there was any reason at all to support them. Some real measurements
showing the kind of performance you quote would definitely change my
mind.

>  I agree
> that old parsers providing (c) aren't particularly interesting, but
> if you spend any time tracing through the guts of the parsing, particularly
> when you see how DTDs are loaded for entity resolution, you begin to see 
> (d) as having potential.  Throwing (b) away may result in less code in
> Digester2, but it may be worth doing some timing tests to see if that 
> code reduction is consequence-free.

What does loading DTDs have to do with namespaces?

> > I still find it hard to believe that leaving out namespace support makes
> > a performance difference. The parser needs to keep a map of
> >    prefix->(stack of namespace)
> > and that's about it. 
> 
> Actually the XML spec distinguishes between the default namespace
> and all other namespaces, so parsers can reasonably make the same
> distinction and try to avoid a bunch of per-entity operations and 
> temporary object creations in the case where there is no namespace.

Sorry, what per-entity operations, and what temporary object creations?

> Look at the piccolo stats published on Sourceforge.  Compare Soap, 
> Soap+NS, and random XML-no NS timings and it suggests that NS 
> ain't free.
> 
> Useful links:
> 
>   Jade (now part of Javolution) http://javolution.org/api/index.html,
>   look at the javolution.xml package (trades String for CharSequence
>   to increase performance, but keeps NS)

Hmm.. I've added a reference to javolution to the wiki. 

However I couldn't find any info on the performance of namespaceAware vs
nonNamespaceAware...

> 
>   Picollo you probably already have the link for, but for anybody
>   else interested: http://piccolo.sourceforge.net

Piccolo does have a page where they state their performance tests for
"SOAP - namespaces off" is about 12% faster than "SOAP - namespaces on".
But there is no further info on what these phrases mean.

The piccolo site provides a download for "SAXBench" benchmarking tool,
but (a) I never managed to get this working, and (b) it doesn't seem to
include the SOAP tests referenced anyway.

http://piccolo.sourceforge.net/bench.html

> 
>   Zapthink comments on XML parsing challenges,
>   http://searchwebservices.techtarget.com/originalContent/0,289142,sid26_gci858888,00.html

No occurrence of the word "namespace" anywhere in the article.

> 
>   Developerworks articles on XML performance,
>   http://www-106.ibm.com/developerworks/xml/library/x-perfap1.html
> 

This article had this paragraph:
<quote>
You should also avoid using namespaces in your applications unless
they're absolutely necessary. Processing a document with the namespace
feature enabled can slow the processing of the whole document. A parser
not only processes namespace declarations, verifying their correctness,
but it also ensures that an XML document is namespace well-formed.
</quote>
but I believe this refers only to code that builds DOMs then serializes
them; during serialization the DOM tree is checked to make sure all
elements have valid namespace declarations. This is not relevant to
digester.

>   Sun articles on XML performance,
>   http://java.sun.com/developer/technicalArticles/xml/JavaTechandXML_part3/

This article didn't seem to have any performance info about namespaces. 

So in summary: 

My instincts still tell me that:
* for documents that don't use namespaces, enabling namespace-aware
parsing will have no impact at all. 
* for documents that do use namespaces, sane coders will want proper
namespace-aware support anyway
* for performance-maniacs of the sort who would deliberately process
documents with namespaces using a non-namespace-aware parser in order to
get faster performance, they are out of luck and will have to wear a
performance hit of about 1%. Or they can patch digester themselves.

The piccolo stats suggest they tested *something* to do with namespaces
and got a 12% hit, but as no further details are provided it's hard to
tell whether this is relevant or not.

For the moment, therefore, I don't intend to add non-ns-aware-parser
support for digester2. Anyone else is very welcome to provide a proper
performance test that proves me wrong at which time I will offer my
congratulations and personally commit their patch to add this feature.

Regards,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Reid Pinchback <re...@yahoo.com>.

--- Simon Kitching <sk...@apache.org> wrote:

> On Wed, 2005-02-02 at 20:45 -0800, Reid Pinchback wrote:
> Of course if someone can demonstrate that non-namespace-aware parsers
> *are* still useful then I'll change my mind.

Just to clarify, since I was being sloppy before (I gotta
stop typing in shorthand) there is an important distinction:

a) having NS-aware parser, always using NS-aware API methods
b) having NS-aware parser, selectively using NS-aware API methods
c) having non-NS-aware parser (and obviously never using NS-aware API methods)
d) having NS-aware parser where the developer fixes a grammar that
   ignores any NS distinctions

Even for Sax the performance difference between (a) and (b) is roughly 
a factor of 2 across all parsers when processing small (typical message-sized) 
docs that don't use NS.  Mucking with (d) is supposed to result in significant
wins when you tune the grammar handling to your app, but I haven't tried it 
myself and I've never seen timing differences quoted.  

I'm not trying to advocate any approach except to notice that, since your 
README mentioned requiring a namespace-aware parser, it sounded like 
there was a potential for options (b), (c), and (d) to become unintentionally
closed to developers in Digester2 when they weren't in Digester1.  I agree
that old parsers providing (c) aren't particularly interesting, but
if you spend any time tracing through the guts of the parsing, particularly
when you see how DTDs are loaded for entity resolution, you begin to see 
(d) as having potential.  Throwing (b) away may result in less code in
Digester2, but it may be worth doing some timing tests to see if that 
code reduction is consequence-free.

> I still find it hard to believe that leaving out namespace support makes
> a performance difference. The parser needs to keep a map of
>    prefix->(stack of namespace)
> and that's about it. 

Actually the XML spec distinguishes between the default namespace
and all other namespaces, so parsers can reasonably make the same
distinction and try to avoid a bunch of per-entity operations and 
temporary object creations in the case where there is no namespace.
Look at the piccolo stats published on Sourceforge.  Compare Soap, 
Soap+NS, and random XML-no NS timings and it suggests that NS 
ain't free.

Useful links:

  Jade (now part of Javolution) http://javolution.org/api/index.html,
  look at the javolution.xml package (trades String for CharSequence
  to increase performance, but keeps NS)

  Picollo you probably already have the link for, but for anybody
  else interested: http://piccolo.sourceforge.net

  Zapthink comments on XML parsing challenges,
  http://searchwebservices.techtarget.com/originalContent/0,289142,sid26_gci858888,00.html

  Developerworks articles on XML performance,
  http://www-106.ibm.com/developerworks/xml/library/x-perfap1.html

  Sun articles on XML performance,
  http://java.sun.com/developer/technicalArticles/xml/JavaTechandXML_part3/

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

On Wed, 2005-02-02 at 20:45 -0800, Reid Pinchback wrote:
> --- Simon Kitching <sk...@apache.org> wrote:
> 
> > Supporting namespaces in an xml parser seems very simple to me. I think
> > it much more likely that only antique and unmaintained parsers fail to
> > support namespaces. And people who are determined to use antique and
> > unmaintained parsers can just stick with digester 1.x as far as I am
> > concerned. I'm not pushing for digester to remove non-namespace-aware
> > support - just digester2!
> 
> Wow, that is an unexpectedly harsh reaction.  My reason for asking 
> was simple, and I believe not unreasonable.   You were the one asking
> for feedback on your proposal. 

Sorry, Reid. I didn't mean it that way. I apologise for any offense.
I was just stating my personal opinion - that every app eventually drops
support for obsolete technologies, and I think it's time to drop support
for non-namespace-aware parsers. 

I was serious, too, about users of old technology sticking with digester
1.x. I'm aware that upgrading core libs can sometimes be a pain, but
digester1.x is still there and isn't going away (I'm one of the
maintainers for that code, and have every intention of continuing that
even when 2.0 is out). I just don't see the point of migrating the
"backwards compatibility" code from the 1.x series. 

Of course if someone can demonstrate that non-namespace-aware parsers
*are* still useful then I'll change my mind.

> Using the namespace-based API of an XML parser is known throughput substantially, 
> covered in a host of Java xml mag articles, available from google searches, and
> one or two of the Java performance tuning books still in distribution.  XML 
> performance tuning is a tough area, and people continually struggle with it.
> I don't recall the SAX-only stats, but I know that for DOM parsers you can 
> shoot for an increase XML processing bandwidth by an order of magnitude through 
> a change in parser and not using NS.  Antiqueness of parsers isn't the issue.

Is there any chance you could provide a reference to such an article?

I still find it hard to believe that leaving out namespace support makes
a performance difference. The parser needs to keep a map of
   prefix->(stack of namespace)
and that's about it. Surely that's a fraction of a percent of the parser
performance, memory usage, and processing time. So why wouldn't a parser
do it?

Leaving out *validation* would improve performance and footprint
significantly, but validation and namespace support are unrelated.

I had a quick look for high-performance/small-footprint xml parsers:
 parser      NS-support     maintained?
 Piccolo       y              y
 Aelfred       y              y
 ElectrixXML   y              n? (can't find a current website)
 MinML         n              n (last release nov 2001)
 NanoXML       y              n (last release april 2002)
 JapiSoft      y              y (commercial)

I also googled for "xml parser performance namespace" but didn't get
anything relevant.

> I think it helps to keep in mind that NS was intended as a way of creating 
> name-resolution scopes that allow the merging of document structures from 
> different origins that otherwise could experience element and attribute
> name clashes.  When somebody has an application that doesn't require that 
> kind of merging, and they aren't using a namespace-dependent XML technology 
> like Soap or XMLSchma, then using using NS features of an NS parser can
> be a burden without corresponding benefit.  Under the hood, that parser has 
> to do a lot of work to continually manage the NS resolution of the node names.
> It has no way of knowing that the work is pointless - you've told it to
> assume that there is a point when you use the NS features.

True. Namespaces are not relevant in many contexts. But as noted above,
I do find it hard to believe that "parser has to do a lot of work to
manage NS resolution". If you can show me I'm wrong, I'll buy you a
beer ;-)

Regards,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Reid Pinchback <re...@yahoo.com>.

--- Simon Kitching <sk...@apache.org> wrote:

> Supporting namespaces in an xml parser seems very simple to me. I think
> it much more likely that only antique and unmaintained parsers fail to
> support namespaces. And people who are determined to use antique and
> unmaintained parsers can just stick with digester 1.x as far as I am
> concerned. I'm not pushing for digester to remove non-namespace-aware
> support - just digester2!

Wow, that is an unexpectedly harsh reaction.  My reason for asking 
was simple, and I believe not unreasonable.   You were the one asking
for feedback on your proposal. 

Using the namespace-based API of an XML parser is known throughput substantially, 
covered in a host of Java xml mag articles, available from google searches, and
one or two of the Java performance tuning books still in distribution.  XML 
performance tuning is a tough area, and people continually struggle with it.
I don't recall the SAX-only stats, but I know that for DOM parsers you can 
shoot for an increase XML processing bandwidth by an order of magnitude through 
a change in parser and not using NS.  Antiqueness of parsers isn't the issue.

I think it helps to keep in mind that NS was intended as a way of creating 
name-resolution scopes that allow the merging of document structures from 
different origins that otherwise could experience element and attribute
name clashes.  When somebody has an application that doesn't require that 
kind of merging, and they aren't using a namespace-dependent XML technology 
like Soap or XMLSchma, then using using NS features of an NS parser can
be a burden without corresponding benefit.  Under the hood, that parser has 
to do a lot of work to continually manage the NS resolution of the node names.
It has no way of knowing that the work is pointless - you've told it to
assume that there is a point when you use the NS features.

__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - You care about security. So do we. 
http://promotions.yahoo.com/new_mail

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

On Wed, 2005-02-02 at 05:48 -0800, Reid Pinchback wrote:
> One section of the release notes says:
> 
>     The Digester now *always* uses a namespace-aware xml parser.
> 
> I was wondering why this is.  There are a lot of XML parsers
> out there, and some of them have done things like trade
> namespace awareness for performance.  If somebody has a
> application where namespaces aren't an issue, why should
> they be limited to only using a namespace-aware parser?
> Not something that seems like an important issue if you are
> just using a Digester to process some kind of app config
> file, but is an issue if processing streams of XML data
> is fundamentally what the app is about.
> 

Supporting namespaces in an xml parser seems very simple to me. I think
it much more likely that only antique and unmaintained parsers fail to
support namespaces. And people who are determined to use antique and
unmaintained parsers can just stick with digester 1.x as far as I am
concerned. I'm not pushing for digester to remove non-namespace-aware
support - just digester2!

Removing code that handles non-namespace parsers from digester
simplifies the code and reduces the library size. It also pushes users
to write their code correctly; code that processes XML and doesn't
handle namespaces is fundamentally broken and we shouldn't be providing
tools that encourage people to write broken code.

However if you can give an example of a modern and maintained xml parser
that deliberately doesn't support namespaces in order to improve
performance or reduce footprint, I will gladly reconsider.

Or of course the consensus here favours supporting broken parsers :-)

Regards,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Reid Pinchback <re...@yahoo.com>.

One section of the release notes says:

    The Digester now *always* uses a namespace-aware xml parser.

I was wondering why this is.  There are a lot of XML parsers
out there, and some of them have done things like trade
namespace awareness for performance.  If somebody has a
application where namespaces aren't an issue, why should
they be limited to only using a namespace-aware parser?
Not something that seems like an important issue if you are
just using a Digester to process some kind of app config
file, but is an issue if processing streams of XML data
is fundamentally what the app is about.




		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - Helps protect you from nasty viruses. 
http://promotions.yahoo.com/new_mail

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

On Tue, 2005-02-01 at 16:20 +0100, Emmanuel Bourg wrote:
> Reid Pinchback wrote:
> 
> > I strongly agree.  Cyclic package dependencies seem
> > unimportant when you only have a few classes, but as the
> > amount of code grows, you quickly find that testing and
> > refactoring because much more difficult than it had to be.
> 
> Can you give an example of a difficult refactoring due to a cyclic 
> dependency between 2 packages ? I'm not sure to understand the practical 
> issue.

Well, I don't know about the refactoring issues. But I prefer avoiding
cyclic dependencies because:

* You can learn the classes in packages in order, without bouncing back
and forth between packages
* javac, javadoc, UML diagramming tools, etc. can process code in
directory order without having to bounce back and forth. This just has
to improve performance and reliability.
* you can trim down a distribution by progressively leaving out packages
* when porting code or revising code (including refactoring) you can do
  this in a progressive manner, starting with the package at the root of
  the dependency tree and working forward rather than having to migrate
  classes scattered across a selection of packages.
* having clean package dependencies encourages lower code coupling. 
  Quite often I find it prompts me to create clean interfaces to break
  inter-package dependencies, and I then find those interfaces are 
  sensible for many reasons.

Regards,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Reid Pinchback <re...@yahoo.com>.

Sure thing.  Just to make it easier to envision, let's get
packages out of the equation.  Just think about cyclic
dependencies between two classes in the same package.
That is enough to show the problem; packages just add complexity 
because the dependencies can be much harder to detect visually 
(usually you would use something like JDepend to spot them) 
and harder to unwind.

Refactoring is harder simply because you have to do a larger
number of smaller steps.  Doesn't mean impossible, more steps
just mean more work, more time, more money.  Tricky enough when 
only two classes are involved, harder as the number of classes 
involved in the cycle increase.  Get enough classes involved, 
and you start to hear statements like "it will be easier to 
throw that away and start over again than it will be to fix it".

class A {
  int a;
  int fooA(int arg) {
    // 1a. do stuff with {B.fooB,a,arg}
    // 2a. do other stuff with result and {a}
  }
}

class B {
  int b1, b2;
  int fooB(int arg) {
    // 1b. do stuff with {A.fooA,b1,arg}
    // 2b. do other stuff with result and b2
    // 3b. do stuff with {A.fooA,b2,arg}
  }
}


Refactoring remains possible, but tricky because
you have both compile-time code dependencies and
run-time state dependencies.  You are faced with 
things like factoring out small fragments of code 
into helper classes, and maybe introducing an 
interface to at least eliminate the compile-time
dependency between A and B, even if the run-time
dependency remains.  

Often the solution ends up something like

a) make interface I
b) create class C implements I
   and migrate some of A and B state into C
c) modify A and B to share I

It works, it just takes time... and often you
are doing it before even trying to tackle whatever
bug or feature enhancement you were faced with
in the first place.




		
__________________________________ 
Do you Yahoo!? 
Take Yahoo! Mail with you! Get it on your mobile phone. 
http://mobile.yahoo.com/maildemo 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Emmanuel Bourg <eb...@apache.org>.

Reid Pinchback wrote:

> I strongly agree.  Cyclic package dependencies seem
> unimportant when you only have a few classes, but as the
> amount of code grows, you quickly find that testing and
> refactoring because much more difficult than it had to be.

Can you give an example of a difficult refactoring due to a cyclic 
dependency between 2 packages ? I'm not sure to understand the practical 
issue.

Emmanuel Bourg

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Reid Pinchback <re...@yahoo.com>.

--- Simon Kitching <sk...@apache.org> wrote:

> Ok, we'll see what the general consensus is. I happen to personally like
> prefixes rather than suffixes, but will go with the majority opinion.

I vote for prefixes.

> That sounds reasonable. However I do dislike having mutual dependencies
> between java packages; a DAG (directed acyclic graph) is good for a
> number of reasons. 

I strongly agree.  Cyclic package dependencies seem
unimportant when you only have a few classes, but as the
amount of code grows, you quickly find that testing and
refactoring because much more difficult than it had to be.

__________________________________ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

On Mon, 2005-01-31 at 11:23 +0100, Emmanuel Bourg wrote:
> "XXXRule --> ActionXXX for all XXX
>    By using a prefix instead of a suffix, all the Action classes group
>    nicely together in the javadoc."
> 
> I tend to prefer the type as a suffix,

Ok, we'll see what the general consensus is. I happen to personally like
prefixes rather than suffixes, but will go with the majority opinion.

>  to keep them grouped in the 
> javadoc I would rather use an "action(s)" subpackage. With or without 
> 's' is another debate ;)

That sounds reasonable. However I do dislike having mutual dependencies
between java packages; a DAG (directed acyclic graph) is good for a
number of reasons. 

So if we have an "o.a.c.d.actions" package for the standard actions,
then we probably need an "o.a.c.d.factory" package so the ActionFactory
class (which now holds the old Digester.addXXXRule factory methods) can
be pushed down into it. We would then have dependencies of:
 o.a.c.d.actions --> o.a.c.d
 o.a.c.d.factory --> o.a.c.d.actions, o.a.c.d
which is acceptable.

Thoughts?

Regards,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Emmanuel Bourg <eb...@apache.org>.

"XXXRule --> ActionXXX for all XXX
   By using a prefix instead of a suffix, all the Action classes group
   nicely together in the javadoc."

I tend to prefer the type as a suffix, to keep them grouped in the 
javadoc I would rather use an "action(s)" subpackage. With or without 
's' is another debate ;)

Emmanuel Bourg

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Craig McClanahan <cr...@gmail.com>.

As you've discovered, at the technical level Commons karma is
project-wide.  Socially, the practice has been to do exactly what
you've done -- ask to participate and get accepted by the other
developers working on that package.

+1 on Oliver for Digester.  I wish I had time to participate -- the
ideas sound really interesting -- but it's good to see that the
package is being cared for so ably.

Craig


On Fri, 4 Feb 2005 09:21:44 +0100, Oliver Zeigermann
<ol...@gmail.com> wrote:
> On Fri, 04 Feb 2005 21:19:46 +1300, Simon Kitching <sk...@apache.org> wrote:
> > Digester2 is just me so far, though. I'm happy for you to commit to the
> > digester2 directory, and don't think there is anyone else you need to
> > ask.
> 
> Cool. I will need to do some work for money the next two weeks, but
> will contribute the promised stuff ASAP.
> 
> Oliver
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Oliver Zeigermann <ol...@gmail.com>.

On Fri, 4 Feb 2005 08:57:05 +0100, Oliver Zeigermann
<ol...@gmail.com> wrote:
> On Fri, 04 Feb 2005 15:52:16 +1300, Simon Kitching <sk...@apache.org> wrote:
> > > Isn't it about time to give Digester2 a place in SVN, so I can either
> > > create patches against it or  directly commit to it. What about a
> > > branch in commons proper? Or at least the sandbox?
> >
> > Done.
> >
> > Do you have commit rights to Digester? If not, I'd be happy to propose a
> > vote...
> 
> Well, not quite sure, how this is handled, but as I have commit access
> to commons transaction, I should have rights on Digester as well. But
> I seem to remember that it is polite to have some sort of vote done by
> the current committers.
> 
> Who are the current committers? Is there anyone other than you?

Just checked it and I actually have commit access and have shamelessly
added my name to the list of Digester2 developers. I hope that's ok
for everyone, if not I will undo this ASAP.

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Oliver Zeigermann <ol...@gmail.com>.

On Fri, 04 Feb 2005 21:19:46 +1300, Simon Kitching <sk...@apache.org> wrote:
> Digester2 is just me so far, though. I'm happy for you to commit to the
> digester2 directory, and don't think there is anyone else you need to
> ask.

Cool. I will need to do some work for money the next two weeks, but
will contribute the promised stuff ASAP.

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Wendy Smoak <ja...@wendysmoak.com>.

From: "Simon Kitching" <sk...@apache.org>
>> '  It would be nice for SetProperties and SetNestedProperties rules to
>> automatically map xml attributes and element names like "foo-bar" to bean
>> properties of form "fooBar".  '
>
> If you feel like having a go at this yourself, I would be very happy to
> see a patch.

Much as I'd love to play with Digester instead, homework wins out (for the 
moment at least.)  Let's see how long it takes me to write "a small 
recursive-descent parser that reads in arithmetic expressions, parses each 
one into an abstract syntax tree, and evaluates it".  :)

-- 
Wendy Smoak 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

On Fri, 2005-02-04 at 10:45 -0700, Wendy Smoak wrote:
> (oops, wrong button)
> > Not sure if it's been discussed already, but I'm very much in favor of
> this
> > (from the Wiki):
> 
> '  It would be nice for SetProperties and SetNestedProperties rules to
> automatically map xml attributes and element names like "foo-bar" to bean
> properties of form "fooBar".  '
> 
> It's actually listed as a possible enhancement for 1.7, but wherever it ends
> up, it will be appreciated.  (Assuming it isn't there already and I missed
> it...)
> 

Thanks for the feedback Wendy. I added that "to-do" item, so I'm sure it
will get added eventually :-).

If you feel like having a go at this yourself, I would be very happy to
see a patch. Otherwise, it is lower on my priority list than getting the
basic digester2 structure sorted so may be quite a while away from being
implemented.

Regards,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Wendy Smoak <ja...@wendysmoak.com>.

(oops, wrong button)
> Not sure if it's been discussed already, but I'm very much in favor of
this
> (from the Wiki):

'  It would be nice for SetProperties and SetNestedProperties rules to
automatically map xml attributes and element names like "foo-bar" to bean
properties of form "fooBar".  '

It's actually listed as a possible enhancement for 1.7, but wherever it ends
up, it will be appreciated.  (Assuming it isn't there already and I missed
it...)

-- 
Wendy Smoak


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Wendy Smoak <ja...@wendysmoak.com>.

Not sure if it's been discussed already, but I'm very much in favor of this
(from the Wiki):



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

On Fri, 2005-02-04 at 08:57 +0100, Oliver Zeigermann wrote:
> On Fri, 04 Feb 2005 15:52:16 +1300, Simon Kitching <sk...@apache.org> wrote:
> > Do you have commit rights to Digester? If not, I'd be happy to propose a
> > vote...
> 
> Well, not quite sure, how this is handled, but as I have commit access
> to commons transaction, I should have rights on Digester as well.

Ah yes .. I forgot that karma for commons covers all projects..

>  But
> I seem to remember that it is polite to have some sort of vote done by
> the current committers.

Yes. 

> 
> Who are the current committers? Is there anyone other than you?

Robert Donkin also keeps an eye on Digester, though he's been more
involved in Betwixt than Digester recently.

Craig McC certainly qualifies as a committer, but appears to be kept
busy by other projects.

Otherwise, it's just me.

Digester2 is just me so far, though. I'm happy for you to commit to the
digester2 directory, and don't think there is anyone else you need to
ask.

Cheers,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Oliver Zeigermann <ol...@gmail.com>.

On Fri, 04 Feb 2005 15:52:16 +1300, Simon Kitching <sk...@apache.org> wrote:
> > Isn't it about time to give Digester2 a place in SVN, so I can either
> > create patches against it or  directly commit to it. What about a
> > branch in commons proper? Or at least the sandbox?
> 
> Done.
> 
> Do you have commit rights to Digester? If not, I'd be happy to propose a
> vote...

Well, not quite sure, how this is handled, but as I have commit access
to commons transaction, I should have rights on Digester as well. But
I seem to remember that it is polite to have some sort of vote done by
the current committers.

Who are the current committers? Is there anyone other than you?

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

On Thu, 2005-02-03 at 23:36 +0100, Oliver Zeigermann wrote:
> Hi Simon!
> 
> On Thu, 03 Feb 2005 23:57:30 +1300, Simon Kitching <sk...@apache.org> wrote:
> > I look forward to seeing your ideas on stringifying trees of elements.
> 
> Isn't it about time to give Digester2 a place in SVN, so I can either
> create patches against it or  directly commit to it. What about a
> branch in commons proper? Or at least the sandbox?

Done. 

Do you have commit rights to Digester? If not, I'd be happy to propose a
vote...

> > actions*. And I generally do debugging by enabling commons-logging
> > output rather than write custom debugging actions anyway. Can you think
> > of some usecases where this would be useful?
> 
> Hmmm, using SAX it always is a bit tricky to get a good idea how your
> XML document that is being parsed *really* looks like. commons-logging
> is no good in that case. If you have something that collects the whole
> document and regenerates it this can be a very valuable debug
> information. Consider the stuff you parse is not in your file system,
> but comes from a stream from a remote server it isn't all obvious what
> is looks like.

Good point.

>  
> > Note also that currently RuleManager can return prebuilt lists when
> > match is called; no List object needs instantiating. However if "always
> > present" actions have to be inserted into each list, then a new List
> > object is required to be created for each match call.
> 
> I understand what you say, but do not understand why a new list would
> have to be build with each match call. Why can't you statically addd
> the "always present" action into the list? Coul you explain?

Possible, I guess. Just a bit tricky...

Regards,

Simon



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Oliver Zeigermann <ol...@gmail.com>.

Hi Simon!

On Thu, 03 Feb 2005 23:57:30 +1300, Simon Kitching <sk...@apache.org> wrote:
> I look forward to seeing your ideas on stringifying trees of elements.

Isn't it about time to give Digester2 a place in SVN, so I can either
create patches against it or  directly commit to it. What about a
branch in commons proper? Or at least the sandbox?

> > > I've not thought too much about obj->xml, and anyway Betwixt has that
> > > reasonably well covered as far as I know.
> >
> > The xmlio out part is much less than obj->xml, but rather a set of
> > helpers on a low level. It also addresses byte encodings which has not
> > been thought of in many XML printing solutions.
> 
> Hmm.. not sure what to do with this code, then. But I'm pretty sure
> Digester is not the right home for it...

Agreed. Maybe a commons component of its own? Very small code, but
reasonable in scope. Many people need XML printing....

Thoughts?

> >
> > > If you mean having some debug Action that is triggered *for every
> > > element seen* in addition to the ones whose patterns actually match,
> > > then that can be done fairly easily by subclassing a Rules (in
> > > digester1.x) or RuleManager (in digester2.x) class. I guess we could
> > > build it in to the default class though...
> >
> > This would fit into the xmlio matching above: have an action that is
> > called unconditionally. This could be useful in many scenarios.
> > Shouldn't this be part of the default rule manager?
> >
> 
> There are usecases for having a set of actions that is returned if no
> pattern is matched. In particular, it is nice to be able to generate an
> error, "unrecognised element", if you are very fussy about the input. I
> would definitely like to add this to DefaultRuleManager. And this
> feature would fit the xmlio scenario fine.

Agreed.

> Having a set of actions that are returned *in addition to* any others is
> possibly more controversial. There was someone looking for exactly that
> on the digester user list a while ago, wanting to execute
> SetNestedPropertiesRule for each element. I'm not so convinced this is a
> good idea, though: seems awful easy to shoot yourself in the foot!
> 
> Apart from the "debugging" scenario you mention, I can't see a usecase
> for having an action that is returned *in addition to the other matching

When you populte beans or call methods on classes I would agree it
rather is a hazard. But if you think of an XML document as some sort
of message why shouldn't be there more than one part of a complex
system that is interested in it? I need to think this over. Maybe it
is time to do some coding and experiments now....

> actions*. And I generally do debugging by enabling commons-logging
> output rather than write custom debugging actions anyway. Can you think
> of some usecases where this would be useful?

Hmmm, using SAX it always is a bit tricky to get a good idea how your
XML document that is being parsed *really* looks like. commons-logging
is no good in that case. If you have something that collects the whole
document and regenerates it this can be a very valuable debug
information. Consider the stuff you parse is not in your file system,
but comes from a stream from a remote server it isn't all obvious what
is looks like.

> Note also that currently RuleManager can return prebuilt lists when
> match is called; no List object needs instantiating. However if "always
> present" actions have to be inserted into each list, then a new List
> object is required to be created for each match call.

I understand what you say, but do not understand why a new list would
have to be build with each match call. Why can't you statically addd
the "always present" action into the list? Coul you explain?

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

Hi Oliver,

I look forward to seeing your ideas on stringifying trees of elements.

> > 
> > The Rule (Action) classes interact with domain-specific (user) classes
> > via BeanUtils and reflection. I don't see any alternative, except for
> > the "pre-processor" type xml mapping tools, or runtime bytecode
> > generation, neither of which are really Digester's domain.
> 
> Well, there it is, my reflection. So we had a misunderstanding. The
> options you name are worse than refelection, I agree, but why using
> the BeanUtils in the first place? Isn't plain refelction sufficient?

Well, I don't believe pre-processing is "worse" than digester; it can be
a great solution in some situations. And for the rest, there's
Digester :-)

Digester uses BeanUtils to do type-conversion (via its ConvertUtils
component), converting the strings extracted from the xml to whatever
types the target methods take.

BeanUtils also treats DynaBean classes as if they were normal Java
classes, which is needed for at least one very important Digester user:
struts.

The reflection stuff we use from BeanUtils is only a few dozen lines so
I guess we could import that into Digester itself. However the
ConvertUtils stuff has a lot of code for typeconversion that I would be
reluctant to duplicate. Maybe it's worth having a look at the new
"morph" project as an alternative; it's more tightly focussed on
typeconversion than BeanUtils.

> > 
> > Hmm.. If we had a class that implements RuleManager that always returns
> > a custom Action no matter what the path, then all events would be
> > forwarded to the user-provided action, where the user can call
> >    context.getMatchPath()
> > to access the current path, and determine from there what operations to
> > perform.
[snip]
> > 
> > Thoughts?
> 
> Looks good. However, we would need code that does the same as the
> default rule manager  in getMatchingActions to match relative paths as
> well. xmlio uses the same path syntax as digester2 anyway.
> 
> I will provide something for this as well.

Excellent!

> > I've not thought too much about obj->xml, and anyway Betwixt has that
> > reasonably well covered as far as I know.
> 
> The xmlio out part is much less than obj->xml, but rather a set of
> helpers on a low level. It also addresses byte encodings which has not
> been thought of in many XML printing solutions.

Hmm.. not sure what to do with this code, then. But I'm pretty sure
Digester is not the right home for it...

> 
> > If you mean having some debug Action that is triggered *for every
> > element seen* in addition to the ones whose patterns actually match,
> > then that can be done fairly easily by subclassing a Rules (in
> > digester1.x) or RuleManager (in digester2.x) class. I guess we could
> > build it in to the default class though...
> 
> This would fit into the xmlio matching above: have an action that is
> called unconditionally. This could be useful in many scenarios.
> Shouldn't this be part of the default rule manager?
>  

There are usecases for having a set of actions that is returned if no
pattern is matched. In particular, it is nice to be able to generate an
error, "unrecognised element", if you are very fussy about the input. I
would definitely like to add this to DefaultRuleManager. And this
feature would fit the xmlio scenario fine.

Having a set of actions that are returned *in addition to* any others is
possibly more controversial. There was someone looking for exactly that
on the digester user list a while ago, wanting to execute
SetNestedPropertiesRule for each element. I'm not so convinced this is a
good idea, though: seems awful easy to shoot yourself in the foot!

Apart from the "debugging" scenario you mention, I can't see a usecase
for having an action that is returned *in addition to the other matching
actions*. And I generally do debugging by enabling commons-logging
output rather than write custom debugging actions anyway. Can you think
of some usecases where this would be useful?

Note also that currently RuleManager can return prebuilt lists when
match is called; no List object needs instantiating. However if "always
present" actions have to be inserted into each list, then a new List
object is required to be created for each match call.

Cheers,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Oliver Zeigermann <ol...@gmail.com>.

On Thu, 03 Feb 2005 15:38:30 +1300, Simon Kitching <sk...@apache.org> wrote:
> On Thu, 2005-02-03 at 02:11 +0100, Oliver Zeigermann wrote:
> > On Thu, 03 Feb 2005 11:39:01 +1300, Simon Kitching <sk...@apache.org> wrote:
> > > > I was also wondering, there may be occasions where it is desirable to
> > > > have the full body *including tags*  passed in a call back. This would
> > > > mostly apply in mixed context tags where text is mixed with style
> > > > information that do not need processing like with XTHML.
> > >
> > > You mean stringify the child elements too, like XSLT does if you ask for
> > > the text of a mixed-content element?
> >
> > Yes.
> >
> > > I suppose we could do this, though I am not entirely sure how much use
> > > this would be. Can you think of a use-case?
> >
> > Think of the transformation of our web pages. There is structure
> > information wrapping pure XHTML. You would not want a callback for all
> > formatting tags, would you? Maybe this is not a very common use of
> > Digester, though...
> 
> Ok, I see. It would be reasonably simple to implement; we already
> calculate the full text for each element (so we can pass it to the body
> methods) in the SAXHandler class; we just need to keep appending these
> instead of discarding them when the element ends.
> 
> One issue, I guess, is that by the end of the document we have a
> StringBuffer that contains the entire text for the entire document -
> which might take up a bit of memory. So maybe we need some mechanism for
> an Action to tell the SAXHandler [from its begin() method, via a mixin
> interface, or otherwise] that it wants a full text tree. The SAXHandler
> can then start accumulating.
> 
> If you wished to contribute such a patch, I think I'd be in favour of
> it.

I agree and will contribute such a patch. I will think about such a
mechanism and will discuss it as soon as I have something.

> > Is that so? I have no internal knowlede of beanutils, but I thought
> > there is no other way of calling a parameterized method than by
> > refelection methods. But I am happy to learn something here :)
> 
> Just some minor misunderstanding I think..
> 
> The digester framework invokes Rule (Action) classes directly. There is
> no reflection involved in the invocation of Rule (Action) classes.

I know. But I was thinking of ActionCallMethod having code like 

            Object result = MethodUtils.invokeMethod(
                    target, methodName,
                    paramValues, paramTypes);            
            
Isn't that done by reflection?
 
> I am proposing that xmlrules actually uses reflection to generate a set
> of Action objects when parsing its rule configuration input file. Of
> course the parsing of the actual user input would then be done in the
> normal manner (with the digester framework calling the Actions
> directly).
> 
> The Rule (Action) classes interact with domain-specific (user) classes
> via BeanUtils and reflection. I don't see any alternative, except for
> the "pre-processor" type xml mapping tools, or runtime bytecode
> generation, neither of which are really Digester's domain.

Well, there it is, my reflection. So we had a misunderstanding. The
options you name are worse than refelection, I agree, but why using
the BeanUtils in the first place? Isn't plain refelction sufficient?

> > >
> > > I remember the main issue being that Digester is built around the
> > > concept of having patterns control what operations were executed for
> > > each xml element, and having the invoked logic partitioned into many
> > > small Rule classes.
> > >
> > > You wished the user to write a big switch statement in Java to determine
> > > what operations were executed, as you felt that this was more natural to
> > > people used to writing SAX code by hand.
> > >
> > > We did briefly discuss ways of layering the code so that these were two
> > > possible options the user could choose between, but I couldn't see then
> > > how this would be possible.
> >
> > Thanks for reminding me of my reservations :) Now I remember!
> > Especially when writing rahter simply import code I think it is much
> > easier and obvious to have all the code at one position instead of
> > having it distributed into many classes. However, this seems to be
> > rather simple to accomplish. You just register a single action to be
> > matched for all elements and then access the context to tell you the
> > path of the current element. Maybe having a conveniece method to match
> > paths to the current element directly.
> >
> > Wouldn't this work?
> 
> Hmm.. If we had a class that implements RuleManager that always returns
> a custom Action no matter what the path, then all events would be
> forwarded to the user-provided action, where the user can call
>    context.getMatchPath()
> to access the current path, and determine from there what operations to
> perform.
> 
> // xmlio-style digester
> Action myHandler = new AbstractAction() {
>   public void begin(
>    Context context,
>    String namespace, String name,
>    Attributes attrs) {
> 
>     String path = context.getMatchPath();
>     if (path.equals("......")) {
>         ....
>     } else {
>         ....
>     }
>   }
> 
>   public void body(...) {
>   }
> }
> 
> RuleManager xmlioRuleManager = new XMLIORuleManager(myHandler);
> Digester d  = new Digester();
> d.setRuleManager(xmlioRuleManager);
> 
> Thoughts?

Looks good. However, we would need code that does the same as the
default rule manager  in getMatchingActions to match relative paths as
well. xmlio uses the same path syntax as digester2 anyway.

I will provide something for this as well.

> > > If you can think of some way of merging these quite different
> > > approaches, I'm very keen to hear it. Or if you feel more kindly toward
> > > a "distributed" pattern-matching + Action class approach, then that
> > > would resolve the major issue and we can look at how the other xmlio
> > > features could be provided in Digester (well, we can do that anyway!).
> >
> > Are you thinking of the export features?
> 
> No, just wondering in general if there is stuff that can be merged.

I will check which internals or options can be taken over. In general
the xmlio trick was not to have too many features to be easy to use.

> I've not thought too much about obj->xml, and anyway Betwixt has that
> reasonably well covered as far as I know.

The xmlio out part is much less than obj->xml, but rather a set of
helpers on a low level. It also addresses byte encodings which has not
been thought of in many XML printing solutions.

> If you mean having some debug Action that is triggered *for every
> element seen* in addition to the ones whose patterns actually match,
> then that can be done fairly easily by subclassing a Rules (in
> digester1.x) or RuleManager (in digester2.x) class. I guess we could
> build it in to the default class though...

This would fit into the xmlio matching above: have an action that is
called unconditionally. This could be useful in many scenarios.
Shouldn't this be part of the default rule manager?
 
> Thanks by the way for all your comments. It's great to know other people
> are interested in a digester2...

Thanks for taking them seriously and letting me participate in the
Digester2 design :)

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

On Thu, 2005-02-03 at 02:11 +0100, Oliver Zeigermann wrote:
> On Thu, 03 Feb 2005 11:39:01 +1300, Simon Kitching <sk...@apache.org> wrote:
> > > I was also wondering, there may be occasions where it is desirable to
> > > have the full body *including tags*  passed in a call back. This would
> > > mostly apply in mixed context tags where text is mixed with style
> > > information that do not need processing like with XTHML.
> > 
> > You mean stringify the child elements too, like XSLT does if you ask for
> > the text of a mixed-content element?
> 
> Yes.
>  
> > I suppose we could do this, though I am not entirely sure how much use
> > this would be. Can you think of a use-case?
> 
> Think of the transformation of our web pages. There is structure
> information wrapping pure XHTML. You would not want a callback for all
> formatting tags, would you? Maybe this is not a very common use of
> Digester, though...

Ok, I see. It would be reasonably simple to implement; we already
calculate the full text for each element (so we can pass it to the body
methods) in the SAXHandler class; we just need to keep appending these
instead of discarding them when the element ends.

One issue, I guess, is that by the end of the document we have a
StringBuffer that contains the entire text for the entire document -
which might take up a bit of memory. So maybe we need some mechanism for
an Action to tell the SAXHandler [from its begin() method, via a mixin
interface, or otherwise] that it wants a full text tree. The SAXHandler
can then start accumulating.

If you wished to contribute such a patch, I think I'd be in favour of
it.

> 
> > If you mean pass a DOM tree into the Action to represent the "full body"
> > content, I think not :-).
> 
> Certainly not. I think there is no place for the DOM in Digester.

Phew! :-)

> 
> > > > Or are you by chance referring to my suggestions for xml-rules?
> > >
> > > No, what are they?
> > 
> > I was puzzled about your reference to "reflection" in the previous
> > email, as accessing Rule (now Action) classes is never done via
> > reflection. However in the RELEASE-NOTES.txt I do discuss possible
> > updates to the classes in the xmlrules package to use reflection to make
> > Action classes accessable via the xmlrules mapping file rather than have
> > the xmlrules java code contain an explicit mapping class for each Action
> > as is currently done.
> 
> Is that so? I have no internal knowlede of beanutils, but I thought
> there is no other way of calling a parameterized method than by
> refelection methods. But I am happy to learn something here :)

Just some minor misunderstanding I think..

The digester framework invokes Rule (Action) classes directly. There is
no reflection involved in the invocation of Rule (Action) classes.

I am proposing that xmlrules actually uses reflection to generate a set
of Action objects when parsing its rule configuration input file. Of
course the parsing of the actual user input would then be done in the
normal manner (with the digester framework calling the Actions
directly).

The Rule (Action) classes interact with domain-specific (user) classes
via BeanUtils and reflection. I don't see any alternative, except for
the "pre-processor" type xml mapping tools, or runtime bytecode
generation, neither of which are really Digester's domain.

> > 
> > I remember the main issue being that Digester is built around the
> > concept of having patterns control what operations were executed for
> > each xml element, and having the invoked logic partitioned into many
> > small Rule classes.
> > 
> > You wished the user to write a big switch statement in Java to determine
> > what operations were executed, as you felt that this was more natural to
> > people used to writing SAX code by hand.
> > 
> > We did briefly discuss ways of layering the code so that these were two
> > possible options the user could choose between, but I couldn't see then
> > how this would be possible.
> 
> Thanks for reminding me of my reservations :) Now I remember!
> Especially when writing rahter simply import code I think it is much
> easier and obvious to have all the code at one position instead of
> having it distributed into many classes. However, this seems to be
> rather simple to accomplish. You just register a single action to be
> matched for all elements and then access the context to tell you the
> path of the current element. Maybe having a conveniece method to match
> paths to the current element directly.
> 
> Wouldn't this work?

Hmm.. If we had a class that implements RuleManager that always returns
a custom Action no matter what the path, then all events would be
forwarded to the user-provided action, where the user can call
   context.getMatchPath()
to access the current path, and determine from there what operations to
perform.

// xmlio-style digester
Action myHandler = new AbstractAction() {
  public void begin(
   Context context, 
   String namespace, String name,
   Attributes attrs) {

    String path = context.getMatchPath();
    if (path.equals("......")) {
	....
    } else {
	....
    }
  }

  public void body(...) {
  }
}

RuleManager xmlioRuleManager = new XMLIORuleManager(myHandler);
Digester d  = new Digester();
d.setRuleManager(xmlioRuleManager);

Thoughts?

> 
> Speed is another issue with xmlio, as it is really fast. But with some
> optimizations geared towards this, digester shoudn't relly be much
> slower anyway...

Hopefully...

>  
> > If you can think of some way of merging these quite different
> > approaches, I'm very keen to hear it. Or if you feel more kindly toward
> > a "distributed" pattern-matching + Action class approach, then that
> > would resolve the major issue and we can look at how the other xmlio
> > features could be provided in Digester (well, we can do that anyway!).
> 
> Are you thinking of the export features?

No, just wondering in general if there is stuff that can be merged.
I've not thought too much about obj->xml, and anyway Betwixt has that
reasonably well covered as far as I know.

> 
> Thinking of the import features, having more than one actions being
> invoked on a certain element would be essantial. Just think of some
> sorf of logging or debugging action that is triggered with every
> element next to the normal processing. Does this currently work with
> digester 2?

Having multiple Rule (or Action) instances triggered for an element in
the input has always been supported, and definitely will be present in
digester2; it's critical.

If you mean having some debug Action that is triggered *for every
element seen* in addition to the ones whose patterns actually match,
then that can be done fairly easily by subclassing a Rules (in
digester1.x) or RuleManager (in digester2.x) class. I guess we could
build it in to the default class though...

Thanks by the way for all your comments. It's great to know other people
are interested in a digester2...

Regards,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Oliver Zeigermann <ol...@gmail.com>.

On Thu, 03 Feb 2005 11:39:01 +1300, Simon Kitching <sk...@apache.org> wrote:
> > I was also wondering, there may be occasions where it is desirable to
> > have the full body *including tags*  passed in a call back. This would
> > mostly apply in mixed context tags where text is mixed with style
> > information that do not need processing like with XTHML.
> 
> You mean stringify the child elements too, like XSLT does if you ask for
> the text of a mixed-content element?

Yes.
 
> I suppose we could do this, though I am not entirely sure how much use
> this would be. Can you think of a use-case?

Think of the transformation of our web pages. There is structure
information wrapping pure XHTML. You would not want a callback for all
formatting tags, would you? Maybe this is not a very common use of
Digester, though...

> If you mean pass a DOM tree into the Action to represent the "full body"
> content, I think not :-).

Certainly not. I think there is no place for the DOM in Digester.

> > > Or are you by chance referring to my suggestions for xml-rules?
> >
> > No, what are they?
> 
> I was puzzled about your reference to "reflection" in the previous
> email, as accessing Rule (now Action) classes is never done via
> reflection. However in the RELEASE-NOTES.txt I do discuss possible
> updates to the classes in the xmlrules package to use reflection to make
> Action classes accessable via the xmlrules mapping file rather than have
> the xmlrules java code contain an explicit mapping class for each Action
> as is currently done.

Is that so? I have no internal knowlede of beanutils, but I thought
there is no other way of calling a parameterized method than by
refelection methods. But I am happy to learn something here :)
 
> >
> > > >
> > > > If so I would be more than happy to abandon xmlio (in) as - apart from
> > > > philosophical considerations - it would be superfluous and I would
> > > > offer development support for digester if that is welcome.
> > >
> > > You would be very welcome indeed to work on digester if you wish.
> > >
> > > My memory of our discussions about xmlio/digester is a little vague now,
> > > but I remember coming to the conclusion that their concepts were
> > > different in some fundamental ways. If we *can* find some way to merge
> > > the two projects, though, I'm all for it. Does the fact that Digester
> > > and SAXHandler have been split apart make this possible now?
> >
> > Honestly, I do not remember much of that discussion, but I thought we
> > came to the conclusion that we would try to make xmlio obsolete with
> > Digester2. The reason I preferred xmlio over digester was simplicity
> > and obviousness mainly. Now this new Digester2 core (even better with
> > the Action subclasses in a package of their own) is simple and obvious
> > as well, so I see no strong reason to stick to xmlio.
> 
> That would be very cool.
> 
> I remember the main issue being that Digester is built around the
> concept of having patterns control what operations were executed for
> each xml element, and having the invoked logic partitioned into many
> small Rule classes.
> 
> You wished the user to write a big switch statement in Java to determine
> what operations were executed, as you felt that this was more natural to
> people used to writing SAX code by hand.
> 
> We did briefly discuss ways of layering the code so that these were two
> possible options the user could choose between, but I couldn't see then
> how this would be possible.

Thanks for reminding me of my reservations :) Now I remember!
Especially when writing rahter simply import code I think it is much
easier and obvious to have all the code at one position instead of
having it distributed into many classes. However, this seems to be
rather simple to accomplish. You just register a single action to be
matched for all elements and then access the context to tell you the
path of the current element. Maybe having a conveniece method to match
paths to the current element directly.

Wouldn't this work?

Speed is another issue with xmlio, as it is really fast. But with some
optimizations geared towards this, digester shoudn't relly be much
slower anyway...
 
> If you can think of some way of merging these quite different
> approaches, I'm very keen to hear it. Or if you feel more kindly toward
> a "distributed" pattern-matching + Action class approach, then that
> would resolve the major issue and we can look at how the other xmlio
> features could be provided in Digester (well, we can do that anyway!).

Are you thinking of the export features?

Thinking of the import features, having more than one actions being
invoked on a certain element would be essantial. Just think of some
sorf of logging or debugging action that is triggered with every
element next to the normal processing. Does this currently work with
digester 2?

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

Hi Oliver,

On Wed, 2005-02-02 at 15:22 +0100, Oliver Zeigermann wrote:
> On Wed, 02 Feb 2005 14:48:42 +1300, Simon Kitching <sk...@apache.org> wrote:
> > > - Wouldn't it be possible (and even desirable) to have a more general
> > > Pattern class instead of a String in Digester#addRule?
> > Can you explain more?
> 
> Well, RuleManager is an abstract class (discussion abstract class vs.
> interface applies here as well) with a default implemenation, but
> Pattern is a String. Wouldn't it be more flexible with little extra
> cost to have a Pattern interface with a default String Path
> implementation like the current one?

Well, I would prefer to avoid having users do:
  addRule(new Pattern("/foo/bar"), ....)
as this is just more readable:
  addRule("/foo/bar", ...)

However if we ever do find that there are some patterns that just can't
be represented nicely as a string, then we could simply add a new
method:
  addRule(Pattern p, ...) { ....}
and reimplement addRule to preserve compatibility:
  addRule(String s, ... ) { addRule(new Pattern(s), ...); }

So in short, I would prefer [1] to keep the current String pattern as
one of the options, for user convenience, but I don't see any major
issue with adding Patterns later if we need them. I guess it would break
custom subclasses of RuleManager, but that would be a very rare thing to
do.

And right now, DefaultRuleManager definitely needs its patterns to be
strings, so if we had a Pattern class as the pattern, we would be
forcing users to create an instance just so the DefaultRuleManager could
turn it back into a string.

>  
> > > - I like the bodySegment vs. body design :)
> > Cool. Now I just have to implement it :-)
> 
> Ooops, doesn't it work, yet? 

Minor detail. I just need to merge the code from the example I
referenced into the core. Why are there never enough hours in a day?

> 
> > 
> > The inspiration can be found in the digester 1.6
> > "src/examples/api/document-markup" example, where the code has to go to
> > great lengths to handle XHTML-style input.
> 
> I was also wondering, there may be occasions where it is desirable to
> have the full body *including tags*  passed in a call back. This would
> mostly apply in mixed context tags where text is mixed with style
> information that do not need processing like with XTHML.

You mean stringify the child elements too, like XSLT does if you ask for
the text of a mixed-content element?

I suppose we could do this, though I am not entirely sure how much use
this would be. Can you think of a use-case?

If you mean pass a DOM tree into the Action to represent the "full body"
content, I think not :-).

> > > - I like the no dependency and digester2 package approach
> > 
> > Ok. I really thought people wouldn't like "o.a.c.digester2". In fact,
> > I'm not sure I like it myself. The main reasons are:
> > (1) that I don't know any other projects that do this. Tomcat, struts,
> >    commons-collections, etc, don't do this.
> 
> Tomcat does not need to as it is no library. commons-collections
> should better have done this - for more details have a look at the
> thread all this was discussed in recently.

Yes, I remember that thread. I'll re-read it.

> > As noted, there is still currently a dependency on BeanUtils; digester
> > uses too much from that package to copy the classes into digester. But
> > as noted I would like to experiment with accessing BeanUtils
> > functionality via a custom classloader so that if people have problems
> > with clashing lib versions there is a solution.
> 
> Could you elaborate this?

Suppose digester requires beanutils 1.7, but a user wants to call
digester from an app that is using beanutils 1.6 (or 1.8) or similar,
and the beanutils lib versions are incompatible. 

In this situation, the user is currently out of luck (or at least there
is no documented solution).

But using classloaders it is possible to access classes different from
the classes available to other parts of an app. For example, webapps in
tomcat have their own private libs that are not available to either
tomcat or sibling webapps. Using this sort of trick, we could arrange
for digester to access all the beanutils classes via a user-provided
classloader, which accesses a beanutils-1.7.jar that is not in the
classpath for the rest of the app.

I haven't really thought about this in detail; it's just an idea at the
moment. I'm vaguely envisaging a method
   Digester.setLibraryClassLoader(ClassLoader cl)
or
   Digester.setLibraryClasspath(String customClasspath)

It might end up better to load the whole of Digester in a custom
classloader, in which case the problem is pushed back up to the user
domain; all we would need to do is document how to do this rather than
actually add any custom code.

> 
> > I quite like Emmanuel Bourg's suggestion of an "actions" subpackage to
> > hold the subclasses of Action, which would show that they aren't tightly
> > coupled to the Digester "core" classes.
> 
> That's exactly what I would want to see. 

Well, it's done. I hope to post the new version later today.

> 
> > Or are you by chance referring to my suggestions for xml-rules?
> 
> No, what are they?

I was puzzled about your reference to "reflection" in the previous
email, as accessing Rule (now Action) classes is never done via
reflection. However in the RELEASE-NOTES.txt I do discuss possible
updates to the classes in the xmlrules package to use reflection to make
Action classes accessable via the xmlrules mapping file rather than have
the xmlrules java code contain an explicit mapping class for each Action
as is currently done.

>  
> > >
> > > If so I would be more than happy to abandon xmlio (in) as - apart from
> > > philosophical considerations - it would be superfluous and I would
> > > offer development support for digester if that is welcome.
> > 
> > You would be very welcome indeed to work on digester if you wish.
> > 
> > My memory of our discussions about xmlio/digester is a little vague now,
> > but I remember coming to the conclusion that their concepts were
> > different in some fundamental ways. If we *can* find some way to merge
> > the two projects, though, I'm all for it. Does the fact that Digester
> > and SAXHandler have been split apart make this possible now?
> 
> Honestly, I do not remember much of that discussion, but I thought we
> came to the conclusion that we would try to make xmlio obsolete with
> Digester2. The reason I preferred xmlio over digester was simplicity
> and obviousness mainly. Now this new Digester2 core (even better with
> the Action subclasses in a package of their own) is simple and obvious
> as well, so I see no strong reason to stick to xmlio.

That would be very cool.

I remember the main issue being that Digester is built around the
concept of having patterns control what operations were executed for
each xml element, and having the invoked logic partitioned into many
small Rule classes.

You wished the user to write a big switch statement in Java to determine
what operations were executed, as you felt that this was more natural to
people used to writing SAX code by hand.

We did briefly discuss ways of layering the code so that these were two
possible options the user could choose between, but I couldn't see then
how this would be possible.

If you can think of some way of merging these quite different
approaches, I'm very keen to hear it. Or if you feel more kindly toward
a "distributed" pattern-matching + Action class approach, then that
would resolve the major issue and we can look at how the other xmlio
features could be provided in Digester (well, we can do that anyway!).

Cheers,

Simon

[1] Of course the decision is by consensus, not my preference!

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Oliver Zeigermann <ol...@gmail.com>.

On Wed, 02 Feb 2005 14:48:42 +1300, Simon Kitching <sk...@apache.org> wrote:
> > - Wouldn't it be possible (and even desirable) to have a more general
> > Pattern class instead of a String in Digester#addRule?
> Can you explain more?

Well, RuleManager is an abstract class (discussion abstract class vs.
interface applies here as well) with a default implemenation, but
Pattern is a String. Wouldn't it be more flexible with little extra
cost to have a Pattern interface with a default String Path
implementation like the current one?

> > - I like the bodySegment vs. body design :)
> Cool. Now I just have to implement it :-)

Ooops, doesn't it work, yet? 

> 
> The inspiration can be found in the digester 1.6
> "src/examples/api/document-markup" example, where the code has to go to
> great lengths to handle XHTML-style input.

I was also wondering, there may be occasions where it is desirable to
have the full body *including tags*  passed in a call back. This would
mostly apply in mixed context tags where text is mixed with style
information that do not need processing like with XTHML.

> > - I like the no dependency and digester2 package approach
> 
> Ok. I really thought people wouldn't like "o.a.c.digester2". In fact,
> I'm not sure I like it myself. The main reasons are:
> (1) that I don't know any other projects that do this. Tomcat, struts,
>    commons-collections, etc, don't do this.

Tomcat does not need to as it is no library. commons-collections
should better have done this - for more details have a look at the
thread all this was discussed in recently.

> (2) that upgrading an application using digester 1.x to digester2.x
>     requires changes to all the import statements.

I understand Digester2 is incompatible to 1.x anyway, so changes to
the import statements aren't the primary problem, right? If it was
fully compatible, there would be no need to call the package digester2
anyway.

> As noted, there is still currently a dependency on BeanUtils; digester
> uses too much from that package to copy the classes into digester. But
> as noted I would like to experiment with accessing BeanUtils
> functionality via a custom classloader so that if people have problems
> with clashing lib versions there is a solution.

Could you elaborate this?

> I quite like Emmanuel Bourg's suggestion of an "actions" subpackage to
> hold the subclasses of Action, which would show that they aren't tightly
> coupled to the Digester "core" classes.

That's exactly what I would want to see. 

> Or are you by chance referring to my suggestions for xml-rules?

No, what are they?

> >
> > If so I would be more than happy to abandon xmlio (in) as - apart from
> > philosophical considerations - it would be superfluous and I would
> > offer development support for digester if that is welcome.
> 
> You would be very welcome indeed to work on digester if you wish.
> 
> My memory of our discussions about xmlio/digester is a little vague now,
> but I remember coming to the conclusion that their concepts were
> different in some fundamental ways. If we *can* find some way to merge
> the two projects, though, I'm all for it. Does the fact that Digester
> and SAXHandler have been split apart make this possible now?

Honestly, I do not remember much of that discussion, but I thought we
came to the conclusion that we would try to make xmlio obsolete with
Digester2. The reason I preferred xmlio over digester was simplicity
and obviousness mainly. Now this new Digester2 core (even better with
the Action subclasses in a package of their own) is simple and obvious
as well, so I see no strong reason to stick to xmlio.

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

Hi Oliver, 

On Tue, 2005-02-01 at 18:04 +0100, Oliver Zeigermann wrote:
> I very much like that and think it really is straight forward.
> 
> Comments:
> - Why is Action an abstract class?

So that we can later add new functionality to Action without breaking
custom Action subclasses that users have written. As long as we can
provide a suitable default implementation in the Action abstract class,
everything runs smoothly.

One example is the "bodySegment" callback that is now in Action. In
Digester 1.x we could not have added this to Rule without breaking all
custom Rule classes. But if digester2.0 had been released without it, we
could have added it later with no source or binary compatibility
problems.

Of course because of Java's single-inheritance policy, it would be
impossible for a class to extend both Action and some other class. But
(a) this is extremely unlikely, and (b) using an "adapter" class works
around this anyway if it absolutely has to be done.

> - Wouldn't it be possible (and even desirable) to have a more general
> Pattern class instead of a String in Digester#addRule?
Can you explain more?

> - I like the bodySegment vs. body design :)
Cool. Now I just have to implement it :-)

The inspiration can be found in the digester 1.6
"src/examples/api/document-markup" example, where the code has to go to
great lengths to handle XHTML-style input.

> - I like the no dependency and digester2 package approach

Ok. I really thought people wouldn't like "o.a.c.digester2". In fact,
I'm not sure I like it myself. The main reasons are:
(1) that I don't know any other projects that do this. Tomcat, struts,
   commons-collections, etc, don't do this.
(2) that upgrading an application using digester 1.x to digester2.x 
    requires changes to all the import statements.

As noted, there is still currently a dependency on BeanUtils; digester
uses too much from that package to copy the classes into digester. But
as noted I would like to experiment with accessing BeanUtils
functionality via a custom classloader so that if people have problems
with clashing lib versions there is a solution.

> - It's no secret that I am no fun of reflection stuff: is it really
> necessary to have the subclasses of Action be part of the *very*,
> *very* digester *core*?

Sorry, I don't follow this. Could you explain?

One thing the proposed code does do is separate ActionFactory from
Digester, so the Digester class doesn't have compile-time dependencies
on any Action subclasses.

I quite like Emmanuel Bourg's suggestion of an "actions" subpackage to
hold the subclasses of Action, which would show that they aren't tightly
coupled to the Digester "core" classes.

Or are you by chance referring to my suggestions for xml-rules?

> 
> If so I would be more than happy to abandon xmlio (in) as - apart from
> philosophical considerations - it would be superfluous and I would
> offer development support for digester if that is welcome.

You would be very welcome indeed to work on digester if you wish. 

My memory of our discussions about xmlio/digester is a little vague now,
but I remember coming to the conclusion that their concepts were
different in some fundamental ways. If we *can* find some way to merge
the two projects, though, I'm all for it. Does the fact that Digester
and SAXHandler have been split apart make this possible now?

Regards,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Oliver Zeigermann <ol...@gmail.com>.

I very much like that and think it really is straight forward.

Comments:
- Why is Action an abstract class?
- Wouldn't it be possible (and even desirable) to have a more general
Pattern class instead of a String in Digester#addRule?
- I like the bodySegment vs. body design :)
- I like the no dependency and digester2 package approach
- It's no secret that I am no fun of reflection stuff: is it really
necessary to have the subclasses of Action be part of the *very*,
*very* digester *core*?

If so I would be more than happy to abandon xmlio (in) as - apart from
philosophical considerations - it would be superfluous and I would
offer development support for digester if that is welcome.

Oliver

On Mon, 31 Jan 2005 23:09:28 +1300, Simon Kitching <sk...@apache.org> wrote:
> Hi,
> 
> As I mentioned a few months ago, I've been working on some ideas for
> Digester 2.0. I've put some code and notes up on
>   http://www.apache.org/~skitching
> 
> Comments from all commons-dev subscribers are welcome, but particularly
> from Craig and Robert.
> 
> The RELEASE-NOTES.txt file gives a brief overview of what I've done so
> far, and what I personally would like to see.
> 
> This is *not* intended to be final code, but rather to solicit yes/no
> feedback on what people like/dislike about the posted code. As you will
> see, many parts are still missing and I personally would still like to
> see significant changes even to parts already included (see
> RELEASE-NOTES.txt). However the basic structure is there, including a
> number of controversial (I expect) name changes.
> 
> Once we get the general opinions out, and I have massaged the code into
> something that meets general concensus I hope to then add it to the
> sandbox for everyone to hack away at.
> 
> Cheers,
> 
> Simon
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by Simon Kitching <sk...@apache.org>.

On Mon, 2005-01-31 at 22:20 +0000, robert burrell donkin wrote: 
> hi simon
> 
> my main development machine blew up last week and i'm still struggling 
> to get up and running on a secondary one.
> 
> i haven't had a chance to look at the code yet (and it might be a fair 
> while before i do) but i'd like to suggest that (when the time comes) 
> you consider developing in proper rather than the sandbox. subversion 
> provides a number of options which weren't available in cvs.

No hurry on having a look at the code. However I have posted javadoc for
the new code here:
  http://www.apache.org/~skitching/digester2-javadoc/api/index.html

So while you're waiting for your new machine, you've now got something
to do Robert :-)

Re developing digester2 in proper: well, it really depends upon whether
there is consensus on the ideas I am putting forward. If people are
unsure, and want to see a more complete framework before saying yea/nay
then sandbox might be more appropriate. If we all agree on the basics,
then proper would be fine.

But yes, it's so much easier to manage branches with svn. Of course
there's no problem either with using "svn cp" to copy from
digester-proper into the sandbox, ie make the sandbox contain a branch
of Digester, right? [go subversion!]

Cheers,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester] initial code for Digester2.0

Posted by robert burrell donkin <rd...@apache.org>.

hi simon

my main development machine blew up last week and i'm still struggling 
to get up and running on a secondary one.

i haven't had a chance to look at the code yet (and it might be a fair 
while before i do) but i'd like to suggest that (when the time comes) 
you consider developing in proper rather than the sandbox. subversion 
provides a number of options which weren't available in cvs.

- robert

On 31 Jan 2005, at 10:09, Simon Kitching wrote:

> Hi,
>
> As I mentioned a few months ago, I've been working on some ideas for
> Digester 2.0. I've put some code and notes up on
>   http://www.apache.org/~skitching
>
> Comments from all commons-dev subscribers are welcome, but particularly
> from Craig and Robert.
>
> The RELEASE-NOTES.txt file gives a brief overview of what I've done so
> far, and what I personally would like to see.
>
> This is *not* intended to be final code, but rather to solicit yes/no
> feedback on what people like/dislike about the posted code. As you will
> see, many parts are still missing and I personally would still like to
> see significant changes even to parts already included (see
> RELEASE-NOTES.txt). However the basic structure is there, including a
> number of controversial (I expect) name changes.
>
> Once we get the general opinions out, and I have massaged the code into
> something that meets general concensus I hope to then add it to the
> sandbox for everyone to hack away at.
>
> Cheers,
>
> Simon
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org