You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Oliver Zeigermann <ol...@gmail.com> on 2005/02/10 19:08:21 UTC

[digester2] Additions

Folks,

as I noticed Simon has done so much work on Digester2, I just wanted
to be sure that my scheduled additions still are appropriate and
aligned to the overall design. Here they are:

(1) XMLIORuleManager: A rule manager that takes an action and
unconditionally calls it on any event. It's useful to imitate xmlio
style processing, but not limited to this. One application might be
logging. As it is more general any proposal for a better name?

Quoting code sample from Simon:

> // xmlio-style digester
> Action myHandler = new AbstractAction() {
>   public void begin(
>    Context context,
>    String namespace, String name,
>    Attributes attrs) {
>
>     String path = context.getMatchPath();
>     if (path.equals("......")) {
>         ....
>     } else {
>         ....
>     }
>   }
>
>   public void body(...) {
>   }
> }
>
> RuleManager xmlioRuleManager = new XMLIORuleManager(myHandler);
> Digester d  = new Digester();
> d.setRuleManager(xmlioRuleManager);
>

As an option it might be possible to register an action with any rule
manager that gets called unconditionally - epsecially for logging and
debugging this can be really helpful. However,  I wasn't quite sure if
Simon was happy with that.

(2) Next to the body and bodySegment call back methods there might be
one that gives you the complete body of an element as a string
composed of all character and markup data. This might be useful when
you want to verbosely take over formatted mixed content like in XHTML.

For performance reasons there should be some mechanism that tells
digester when such content is needed on an element basis. Maybe
something returned by the begin method or - maybe cleaner - something
returned by an additional call back directly called after begin like
boolean requiresComposedBody(String namespace, String name) or
anything similar. This method could alternatively be part of an
additional or extended interface.

Comments?

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] Additions

Posted by Simon Kitching <sk...@apache.org>.

On Tue, 2005-02-15 at 01:35 +0100, Oliver Zeigermann wrote:
> After I have thought about it, I am +1 to do it the way you proposed
> initially: have it all in one class. That's not because I am
> frustrated or am no longer interested, but I think that's a good
> solution and causes the least controversy. Go ahead with it please.

Ok, done.

I've used the term "mandatoryAction" rather than "supplementaryAction"
as it's equivalent, shorter and hopefully more familiar to non-english
speakers.

I've moved the path-matching functionality from SupplementaryRuleManager
into a new class SimpleMatchUtils. And of course new class 
  SimpleMatchUtilsTestCase
now contains the basic xmlio demonstration adapted from the old
  SupplementaryRuleManagerTestCase.

Cheers,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] Additions

Posted by Oliver Zeigermann <ol...@gmail.com>.

On Tue, 15 Feb 2005 11:27:29 +1300, Simon Kitching <sk...@apache.org> wrote:
> On Mon, 2005-02-14 at 17:24 +0100, Oliver Zeigermann wrote:
> > On Tue, 15 Feb 2005 01:08:53 +1300, Simon Kitching <sk...@apache.org> wrote:
> > > > - DefaultRuleManager now returns an unmodifiable list upon getMatchingActions
> > >
> > > I'm not so sure about this one.
> > >
> > > This change means a new object is created each time getMatchingActions
> > > is called, and that is once for every xml element in the input document.
> > > And in addition, each method call is "relayed" by the UnmodifiableList
> > > wrapper to the underlying one, resulting in a performance hit too. Is
> > > this significant in the context of parsing an xml document? I don't
> > > know; maybe it's not measurable.
> > >
> > > However the benefits of this change are really limited only to
> > > preventing bad code in SAXHandler or RuleManager classes. Cases where
> > > users write their own versions of those will be extremely rare. And
> > > mistakes in the core digester classes are, I think, likely to be picked
> > > up by our unit tests.
> > >
> > > So my current feeling is that while the above change **is theoretically
> > > correct**, I would leave it out and return the actual list, just
> > > documenting that it should be treated as unmodifiable.
> > >
> >
> > Well, think of the situation where people inherit and do things like I
> > have done on my first try. So, I would be all for keeping it in, but
> > would not protest if you removed it.
> 
> I said above: "Cases where users write their own versions of those will
> be extremely rare". I really believe that the number of people writing
> their own custom RuleManager class will be about 3, worldwide, over the
> next decade. But the number of people who get impacted by the
> (admittedly small) performance hit for the proposed change is 100% of
> users.
> 
> I guess we have a tie here: +1 from you, and -1 from me. The best thing
> would be if we got a third opinion from either Craig or Robert.
> 
> Otherwise, we could just have a remote game of bzflag
> (http://bzflag.org/screenshots/bzfi0026.jpg) or something, with winner
> takes all :-)

I am not +1. I am +0. I have reverted that change.
 
> [snip]
> 
> > >
> > > The ReadOnlyConcatList is cute. I don't think it would save any time or
> > > effort over
> > >   List list = new ArrayList(l1.size() + l2.size());
> > >   list.appendAll(l1);
> > >   list.appendAll(l2);
> > > but I guess the unmodifiable feature is thrown in for free.
> >
> > ReadOnlyConcatList certainly would be faster and have a smaller memory
> > footprint.
> 
> Can you describe why ReadOnlyConcatList is faster and smaller?
> 
> The array list is (in c terms) just an array of pointers to Actions.
> 
> To create a new one, I am sure the ArrayList constructor does:
>  * malloc a "manager" object with a few ints (size, capacity)
>    and a reference to the data block.
>  * malloc the data block, which is capacity*sizeof(reference)
>  * for each entry in the source list, copy reference to new list [1] [2]
> Once the new ArrayList has been created, access is very fast.
> [1] The list is likely to be 1, 2, or 3 elements. Having more than 3
> Actions match a particular input xml element would be very unusual..
> [2] There is even the possibility that ArrayList has been optimised to
> do a "memcpy" if its source is an ArrayList.
> 
> With the ReadOnlyConcatList, the work done is:
>  * malloc a block of data for the ReadOnlyConcatList, which contains one
> int and the two references to ArrayList objects.
>  * copy 2 references and call size once
> And accesses are fractionally slower, because get invokes the
> ReadOnlyConcatList, which performs an if-statement then invokes the get
> method on the appropriate underlying list.
> 
> So on the positive side I think that ReadOnlyConcatList is marginally
> smaller (it doesn't need the data block, which is typically 4-12 bytes),
> and slightly more efficient to create (one less malloc, a few less
> copies of references)
> 
> On the negative, the get method is slightly slower to use. It also adds
> a dozen lines to the source and an extra .class file to the jar.
> 
> All in all, we are debating microseconds and handfuls of bytes here.
> Both seem *acceptable* solutions to me, though I do prefer simple in
> this case. And like the Unmodifiable discussion, the easiest resolution
> would be if some other Digester developer (eg Robert or Craig) would
> give their call, then we could get on with it.

No problem, really, simply replace it.

> If neither Robert nor Craig offer their opinion, then I'm willing to
> settle for the following compromise. It won't make either of us 100%
> happy, but I don't think there's anything we can't live with, either,
> and we can move forward again:
>  * merge "fallback" rules into RuleManager and DefaultRuleManager,
>    and remove FallbackRuleManager
>  * leave SupplementaryRuleManager exactly as you wrote it, with
>    caching and ReadOnlyConcatList (except for the changes needed
>    due to removal of FallbackRuleManager)
>  * remove the UnmodifiableList stuff from DefaultRuleManager
> 
> Note also that while I won't deliberately change the RuleManager
> interface to cause problems for SupplementaryRuleManager, I won't
> hesitate to change it if it is needed to add the more complex matching
> features I described, and fixing SupplementaryRuleManager will be up to
> you...
> 
> Would that be ok by you?

After I have thought about it, I am +1 to do it the way you proposed
initially: have it all in one class. That's not because I am
frustrated or am no longer interested, but I think that's a good
solution and causes the least controversy. Go ahead with it please.

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] Additions

Posted by Simon Kitching <sk...@apache.org>.

On Mon, 2005-02-14 at 17:24 +0100, Oliver Zeigermann wrote:
> On Tue, 15 Feb 2005 01:08:53 +1300, Simon Kitching <sk...@apache.org> wrote:
> > > - DefaultRuleManager now returns an unmodifiable list upon getMatchingActions
> > 
> > I'm not so sure about this one.
> > 
> > This change means a new object is created each time getMatchingActions
> > is called, and that is once for every xml element in the input document.
> > And in addition, each method call is "relayed" by the UnmodifiableList
> > wrapper to the underlying one, resulting in a performance hit too. Is
> > this significant in the context of parsing an xml document? I don't
> > know; maybe it's not measurable.
> > 
> > However the benefits of this change are really limited only to
> > preventing bad code in SAXHandler or RuleManager classes. Cases where
> > users write their own versions of those will be extremely rare. And
> > mistakes in the core digester classes are, I think, likely to be picked
> > up by our unit tests.
> > 
> > So my current feeling is that while the above change **is theoretically
> > correct**, I would leave it out and return the actual list, just
> > documenting that it should be treated as unmodifiable.
> > 
> 
> Well, think of the situation where people inherit and do things like I
> have done on my first try. So, I would be all for keeping it in, but
> would not protest if you removed it.

I said above: "Cases where users write their own versions of those will
be extremely rare". I really believe that the number of people writing
their own custom RuleManager class will be about 3, worldwide, over the
next decade. But the number of people who get impacted by the
(admittedly small) performance hit for the proposed change is 100% of
users.

I guess we have a tie here: +1 from you, and -1 from me. The best thing
would be if we got a third opinion from either Craig or Robert.

Otherwise, we could just have a remote game of bzflag
(http://bzflag.org/screenshots/bzfi0026.jpg) or something, with winner
takes all :-)

[snip]

> > 
> > The ReadOnlyConcatList is cute. I don't think it would save any time or
> > effort over
> >   List list = new ArrayList(l1.size() + l2.size());
> >   list.appendAll(l1);
> >   list.appendAll(l2);
> > but I guess the unmodifiable feature is thrown in for free.
> 
> ReadOnlyConcatList certainly would be faster and have a smaller memory
> footprint.

Can you describe why ReadOnlyConcatList is faster and smaller?

The array list is (in c terms) just an array of pointers to Actions.

To create a new one, I am sure the ArrayList constructor does:
 * malloc a "manager" object with a few ints (size, capacity) 
   and a reference to the data block.
 * malloc the data block, which is capacity*sizeof(reference)
 * for each entry in the source list, copy reference to new list [1] [2]
Once the new ArrayList has been created, access is very fast.
[1] The list is likely to be 1, 2, or 3 elements. Having more than 3
Actions match a particular input xml element would be very unusual..
[2] There is even the possibility that ArrayList has been optimised to
do a "memcpy" if its source is an ArrayList.

With the ReadOnlyConcatList, the work done is:
 * malloc a block of data for the ReadOnlyConcatList, which contains one
int and the two references to ArrayList objects.
 * copy 2 references and call size once
And accesses are fractionally slower, because get invokes the
ReadOnlyConcatList, which performs an if-statement then invokes the get
method on the appropriate underlying list.

So on the positive side I think that ReadOnlyConcatList is marginally
smaller (it doesn't need the data block, which is typically 4-12 bytes),
and slightly more efficient to create (one less malloc, a few less
copies of references)

On the negative, the get method is slightly slower to use. It also adds
a dozen lines to the source and an extra .class file to the jar.

All in all, we are debating microseconds and handfuls of bytes here.
Both seem *acceptable* solutions to me, though I do prefer simple in
this case. And like the Unmodifiable discussion, the easiest resolution
would be if some other Digester developer (eg Robert or Craig) would
give their call, then we could get on with it.

>  
> > I'm also thinking about the possibility that we may move to a
> > RuleManager style that matches based on much richer state than just the
> > current path. FOr example, if we want to support patterns like this:
> >  "/foo/bar[@id='me']/baz[pos()==last()]"    [1]
> > then getMatchingActions will definitely not be taking a String path any
> > more, but instead an array of w3c.dom.Element objects or similar so it
> > has sufficient info to decide whether the pattern matches or not. That
> > won't affect many places in the code, but would make this caching
> > impossible to implement so we'd have to take it out then anyway.
> 
> We can worry about this as soon as getMatchingActions takes anything
> else than a String. Right now the method signature that I implemented
> says it is a String, so there is no problem.

It just seems a shame for you to write code, javadoc, and unit tests for
something that may have to be thrown away in order to support changes to
the RuleManager class that are in the roadmap.

[snip simon's counter-proposal]

> OK, if you really want to do this, I won't stop you.

I really do want you (and as many others as possible) working on
digester2. More ideas is definitely better, and anyway it's boring
working on a project alone. So I'm very reluctant indeed to commit my
solution just because "no-one will stop me".

I think a "new" project is always going to have more debate on these
sorts of things than an established one. And having two contributors
makes it trickier. We just need a little patience, and a willingness to
do lots of back-and-forth discussion until conventions for the project
are established...

If neither Robert nor Craig offer their opinion, then I'm willing to
settle for the following compromise. It won't make either of us 100%
happy, but I don't think there's anything we can't live with, either,
and we can move forward again:
 * merge "fallback" rules into RuleManager and DefaultRuleManager,
   and remove FallbackRuleManager
 * leave SupplementaryRuleManager exactly as you wrote it, with
   caching and ReadOnlyConcatList (except for the changes needed
   due to removal of FallbackRuleManager)
 * remove the UnmodifiableList stuff from DefaultRuleManager

Note also that while I won't deliberately change the RuleManager
interface to cause problems for SupplementaryRuleManager, I won't
hesitate to change it if it is needed to add the more complex matching
features I described, and fixing SupplementaryRuleManager will be up to
you...

Would that be ok by you?

Cheers,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] Additions

Posted by Oliver Zeigermann <ol...@gmail.com>.

On Tue, 15 Feb 2005 01:08:53 +1300, Simon Kitching <sk...@apache.org> wrote:
> > - DefaultRuleManager now returns an unmodifiable list upon getMatchingActions
> 
> I'm not so sure about this one.
> 
> This change means a new object is created each time getMatchingActions
> is called, and that is once for every xml element in the input document.
> And in addition, each method call is "relayed" by the UnmodifiableList
> wrapper to the underlying one, resulting in a performance hit too. Is
> this significant in the context of parsing an xml document? I don't
> know; maybe it's not measurable.
> 
> However the benefits of this change are really limited only to
> preventing bad code in SAXHandler or RuleManager classes. Cases where
> users write their own versions of those will be extremely rare. And
> mistakes in the core digester classes are, I think, likely to be picked
> up by our unit tests.
> 
> So my current feeling is that while the above change **is theoretically
> correct**, I would leave it out and return the actual list, just
> documenting that it should be treated as unmodifiable.
> 

Well, think of the situation where people inherit and do things like I
have done on my first try. So, I would be all for keeping it in, but
would not protest if you removed it.

> > - created a new FallbackRuleManager class that only takes care about
> > fallback actions
> 
> [see my comments at end first!]
> 
> Have you considered writing this class as a *decorator* rather than a
> subclass of DefaultRuleManager? If it was a decorator, it could then be
> applied to any rulemanager class, not just the default. It would be used
> as follows:
>   RuleManager realrm = new DefaultRuleManager(); // or any other type
>   RuleManager rm = new FallbackRuleManagerDecorator(realrm);
>   Digester digester = new Digester(rm);
> 
> The disadvantage would be that it would be necessary to implement
> each of the RuleManager methods in order to forward them to the
> decorated instance, together with the performance hit that would entail.
> [ah, if only this were Ruby....]
> 
> > - SupplementaryRuleManager now takes account of the explicitely
> > unmodifiable list returned by DefaultRuleManager and does a wild
> > mixture of caching and concatenating the list returned from
> > DefaultRuleManager and its own list of supplementary actions
> 
> [see my comments at end first!]
> 
> The ReadOnlyConcatList is cute. I don't think it would save any time or
> effort over
>   List list = new ArrayList(l1.size() + l2.size());
>   list.appendAll(l1);
>   list.appendAll(l2);
> but I guess the unmodifiable feature is thrown in for free.

ReadOnlyConcatList certainly would be faster and have a smaller memory
footprint.

> I've been thinking about the caching and see a mild problem. If wildcard
> pattern "a" (equivalent to "*/a" in digester1), and <a> elements are
> scattered around in various places in the input document, then the cache
> is going to grow fairly large.
> 
> So basically, whether caching improve things or not depends upon whether
> the input xml is simple and repetitive (caching wins) or reasonably
> unstructured (caching loses). And given we can't tell which is more
> likely over the set of digester users (at least I don't have a clue) I
> would probably prefer the simplest solution - not to cache.

If the cache gets too large we can easily limit its size using LRUMap
or something.
 
> I'm also thinking about the possibility that we may move to a
> RuleManager style that matches based on much richer state than just the
> current path. FOr example, if we want to support patterns like this:
>  "/foo/bar[@id='me']/baz[pos()==last()]"    [1]
> then getMatchingActions will definitely not be taking a String path any
> more, but instead an array of w3c.dom.Element objects or similar so it
> has sufficient info to decide whether the pattern matches or not. That
> won't affect many places in the code, but would make this caching
> impossible to implement so we'd have to take it out then anyway.

We can worry about this as soon as getMatchingActions takes anything
else than a String. Right now the method signature that I implemented
says it is a String, so there is no problem.

> You're probably going to hit me for this :-)

I can't, you are too far away ;)
 
> After some thought, I think my preferred option would be to include
> "fallback" and "supplementary" features as standard, ie define them in
> the RuleManager interface, provide an implementation in the
> DefaultRuleManager class, and to remove the FallbackRuleManager and
> SupplementaryRuleManager classes completely.
> 
> I've always said I'm happy for fallback features to be in
> DefaultRuleManager, and after thought I believe it ought to be defined
> as part of the main RuleManager interface. And as you're so keen for
> supplementary actions, I'm ok with adding them too as long as there is
> no performance hit for people who don't use them. Note that this would
> definitely be the easiest solution from the user point of view, with no
> special classes required in order to use "supplementary" actions.
> 
> And I would go for the simplest implementation of "supplementary"
> actions, by simply creating a new list each time getMatchingActions is
> called when one or more supplementary actions are defined, without any
> clever caching. Your efforts at performance improvements for
> supplementary actions are impressive, but it's not clear to me that they
> would in the end work any better than just creating new lists and
> simpler is definitely better. I don't actually think there *is* a way to

I disagree. My efforts are relatively modest and pretty obvious and
actually bring performance gains.

> implement supplementary actions without a performance penalty except to
> integrate them deeply into the concrete RuleManager's data model for
> storing rules and I would be against that just because I'm not convinced
> they will be widely used.

The way I have done it now really has very limited performance
penalty. Maybe it is even faster than in DefaultRuleManager.

> But if we have them in the main API, and we later find that people *are*
> using them often then we can think about internal optimisations at that
> point.
> 
> The result would be that DefaultRuleManager instances without any
> supplementary actions defined would work without any performance
> penalty. Users who choose to use supplementary actions will pay a
> (fairly small) penalty. Digester developers will be happy because, while
> there is extra code required in each concrete RuleManager class in order
> to support supplementary actions, the code will be simple and obvious.

My intention was to have the supplementary action part work as fast as
in xmlio. So, why not keep it the way it is? Complexity is in another
class, so Digester developers should be happy as well.

> so here's what I would see for RuleManager interface:
> 
> public interface RuleManager
> {
>   public void addFallbackAction(Action action);
>   public void addFallbackActions(List actions);
>   public void addSupplementaryAction(Action action);
>   public void addSupplementaryActions(List actions);
> }
> 
> and for DefaultRuleManager.getMatchingActions(String path) {
>    ... current stuff ...
> 
>    if (actions == null) {
>     actions = fallbackActions;
>    }
> 
>    if (supplementaryActions != null) {
>      oldActions = actions;
>      actions = new ArrayList(
>         oldActions.size() + supplementaryActions.size();
>      actions.addAll(oldActions);
>      actions.addAll(supplementaryActions);
>    }
> 
>   return actions;
> }
> 

OK, if you really want to do this, I won't stop you.

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] Additions

Posted by Simon Kitching <sk...@apache.org>.

Hi Oliver,

On Mon, 2005-02-14 at 08:16 +0100, Oliver Zeigermann wrote:
> [update]
> 
> - now all member variables of DefaultRuleManager are private -
> introduced copy ctor for copying members from subclasses

thanks - that looks good to me.

> - DefaultRuleManager now returns an unmodifiable list upon getMatchingActions

I'm not so sure about this one. 

This change means a new object is created each time getMatchingActions
is called, and that is once for every xml element in the input document.
And in addition, each method call is "relayed" by the UnmodifiableList
wrapper to the underlying one, resulting in a performance hit too. Is
this significant in the context of parsing an xml document? I don't
know; maybe it's not measurable. 

However the benefits of this change are really limited only to
preventing bad code in SAXHandler or RuleManager classes. Cases where
users write their own versions of those will be extremely rare. And
mistakes in the core digester classes are, I think, likely to be picked
up by our unit tests.

So my current feeling is that while the above change **is theoretically
correct**, I would leave it out and return the actual list, just
documenting that it should be treated as unmodifiable.

Comments?

> - created a new FallbackRuleManager class that only takes care about
> fallback actions

[see my comments at end first!]

Have you considered writing this class as a *decorator* rather than a
subclass of DefaultRuleManager? If it was a decorator, it could then be
applied to any rulemanager class, not just the default. It would be used
as follows:
  RuleManager realrm = new DefaultRuleManager(); // or any other type
  RuleManager rm = new FallbackRuleManagerDecorator(realrm);
  Digester digester = new Digester(rm);

The disadvantage would be that it would be necessary to implement
each of the RuleManager methods in order to forward them to the
decorated instance, together with the performance hit that would entail.
[ah, if only this were Ruby....]

> - SupplementaryRuleManager now takes account of the explicitely
> unmodifiable list returned by DefaultRuleManager and does a wild
> mixture of caching and concatenating the list returned from
> DefaultRuleManager and its own list of supplementary actions

[see my comments at end first!]

The ReadOnlyConcatList is cute. I don't think it would save any time or
effort over
  List list = new ArrayList(l1.size() + l2.size());
  list.appendAll(l1);
  list.appendAll(l2);
but I guess the unmodifiable feature is thrown in for free.

I've been thinking about the caching and see a mild problem. If wildcard
pattern "a" (equivalent to "*/a" in digester1), and <a> elements are
scattered around in various places in the input document, then the cache
is going to grow fairly large.

So basically, whether caching improve things or not depends upon whether
the input xml is simple and repetitive (caching wins) or reasonably
unstructured (caching loses). And given we can't tell which is more
likely over the set of digester users (at least I don't have a clue) I
would probably prefer the simplest solution - not to cache.

I'm also thinking about the possibility that we may move to a
RuleManager style that matches based on much richer state than just the
current path. FOr example, if we want to support patterns like this:
 "/foo/bar[@id='me']/baz[pos()==last()]"    [1]
then getMatchingActions will definitely not be taking a String path any
more, but instead an array of w3c.dom.Element objects or similar so it
has sufficient info to decide whether the pattern matches or not. That
won't affect many places in the code, but would make this caching
impossible to implement so we'd have to take it out then anyway.

[1] sorry my xpath is a bit rusty - this is probably not valid

> - Both FallbackRuleManager and SupplementaryRuleManager now can have
> several callback actions
> - Still docs and tests need improvement - will do so once the code
> seems somewhat stable
> 
> Comments?

You're probably going to hit me for this :-)

After some thought, I think my preferred option would be to include
"fallback" and "supplementary" features as standard, ie define them in
the RuleManager interface, provide an implementation in the
DefaultRuleManager class, and to remove the FallbackRuleManager and
SupplementaryRuleManager classes completely.

I've always said I'm happy for fallback features to be in
DefaultRuleManager, and after thought I believe it ought to be defined
as part of the main RuleManager interface. And as you're so keen for
supplementary actions, I'm ok with adding them too as long as there is
no performance hit for people who don't use them. Note that this would
definitely be the easiest solution from the user point of view, with no
special classes required in order to use "supplementary" actions.

And I would go for the simplest implementation of "supplementary"
actions, by simply creating a new list each time getMatchingActions is
called when one or more supplementary actions are defined, without any
clever caching. Your efforts at performance improvements for
supplementary actions are impressive, but it's not clear to me that they
would in the end work any better than just creating new lists and
simpler is definitely better. I don't actually think there *is* a way to
implement supplementary actions without a performance penalty except to
integrate them deeply into the concrete RuleManager's data model for
storing rules and I would be against that just because I'm not convinced
they will be widely used. 

But if we have them in the main API, and we later find that people *are*
using them often then we can think about internal optimisations at that
point.

The result would be that DefaultRuleManager instances without any
supplementary actions defined would work without any performance
penalty. Users who choose to use supplementary actions will pay a
(fairly small) penalty. Digester developers will be happy because, while
there is extra code required in each concrete RuleManager class in order
to support supplementary actions, the code will be simple and obvious.

so here's what I would see for RuleManager interface:

public interface RuleManager
{
  public void addFallbackAction(Action action);
  public void addFallbackActions(List actions);
  public void addSupplementaryAction(Action action);
  public void addSupplementaryActions(List actions);
}

and for DefaultRuleManager.getMatchingActions(String path) {
   ... current stuff ...

   if (actions == null) {
    actions = fallbackActions;
   }

   if (supplementaryActions != null) {
     oldActions = actions;
     actions = new ArrayList(
	oldActions.size() + supplementaryActions.size();
     actions.addAll(oldActions);
     actions.addAll(supplementaryActions);
   }

  return actions;
}

It would be tempting to put the fallback/supplementary login in
AbstractRuleManager but I think that would be a mistake. The point of
that abstract class is *not* to provide any default behaviour, but
simply to provide ourselves with a place to put hooks for "backward
compatibility" if we change the RuleManager interface. If we put logic
in here, we tempt people who don't want our version of the logic to
implement the RuleManager interface instead of the AbstractRuleManager
class, which we really don't want to encourage.

Over to you Oliver....

Cheers,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] Additions

Posted by Oliver Zeigermann <ol...@gmail.com>.

[update]

- now all member variables of DefaultRuleManager are private -
introduced copy ctor for copying members from subclasses
- DefaultRuleManager now returns an unmodifiable list upon getMatchingActions
- created a new FallbackRuleManager class that only takes care about
fallback actions
- SupplementaryRuleManager now takes account of the explicitely
unmodifiable list returned by DefaultRuleManager and does a wild
mixture of caching and concatenating the list returned from
DefaultRuleManager and its own list of supplementary actions
- Both FallbackRuleManager and SupplementaryRuleManager now can have
several callback actions
- Still docs and tests need improvement - will do so once the code
seems somewhat stable

Comments?

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] Additions

Posted by Oliver Zeigermann <ol...@gmail.com>.

> One design goal was to avoid using protected members. The problem here
> is that using protected for a *member variable* locks us into an
> implementation forever (a protected member is part of the "external" api
> that users can access). I think it preferable to add a public or
> protected method that provides access to the member rather than exposing
> the member directly. The method can later get the data in some manner
> other than reading from a member if we change the implementation. Note
> that having a method that returns a Map isn't a *whole* lot better. I
> think what we need is a "copy-helper" type method rather than exposing
> the namespace member just so a subclass can copy it. I can't think what
> the best solution is at the moment, but I feel there's an elegant (and
> probably fairly well known) solution nearby. How does clone() solve this
> issue?
> 
> Note that I have (I think) been rigorous about avoiding protected
> members in the o.a.c.digester2 package, but not so rigorous in the
> action subpackage. I think it probably is a good idea to apply this to
> the action package too, though.

In case those member variables really change this would be a general
redesign I guess. In this case you might just as well create a copy
from DefaultRuleManager maybe called NewDefaultRuleManager or
DefaultRuleManager2 or whatever. I agree that passing the information
using a method is no better. And having something like a copy helper
is not enough as people might want have access to DefaultRuleManager 
guts in other scenarios as well.
 
> ===
> 
> I would suggest that the match method be placed somewhere other than on
> the Path class. Digester could well provide multiple different matching
> implementations in future. I think a static method on some XMLIOUtils
> class might be appropriate. The user's hand-written Action can then
> choose whether to use that matching algorithm or some other. All that
> Path currently does is return a String representing the "canonical form"
> of the Path, and that seems clean & focussed to me.

OK. If you think so. Done.

> ===
> 
> This bit in SupplementaryRuleManager will cause problems I think:
>     public List getMatchingActions(String path) {
>         List actionList = super.getMatchingActions(path);
>         ...
>         if (supplementaryAction != null) {
>             actionList.add(supplementaryAction);
>         }
> The list the RuleManager returns should not be modified; it is actually
> an internal data structure. This does point out that DefaultRuleManager

I see. You are right.

> should be a little more defensive, perhaps returning
>   Collections.unmodifiableList(actionList)
> which would cause the above add call to fail and throw an exception.
> That would, however, cause an object to be created on each
> getMatchingActions call, which I would prefer to avoid if possible.
> How about this instead?
>     public List getMatchingActions(String path) {
>         List actionList = super.getMatchingActions(path);
>         ...
>         if (supplementaryAction != null) {
>             newActionList = new ArrayList(actionList);
>             newActionList.add(supplementaryAction)
>         }
> 
> I suppose this could be done instead:
>         if (supplementaryAction != null) {
>             if (actionList.contains(supplementaryAction) {
>               // nothing to do, we must have added the
>               // supplementary action to this list earlier
>             } else {
>               // permanently modify DefaultRuleManager's
>               // internal data structure; yecch!
>               actionList.add(action)
>             }
>         }
> but as the comments indicate, I think that having one class manipulate
> another's internal data is not in terribly good taste, nor easy to
> maintain.

Got that. I will think about that and provide something new.

> Why is there only facility for one fallback action, and one
> supplementary action? Wouldn't having a List of actions for each be no
> more difficult and potentially more useful?

Maybe. Will do so.
 
> ===
> 
> On a minor SVN note, the SVN property "svn:keywords" needs to be set on
> a file in order to enable expansion of $Id$ into the appropriate text.
> This can be done manually via:
>   svn propset svn:keywords "Id" filename
> It takes a commit to update the repository. I suggest this would be nice
> to do next time you update the SupplementaryRuleManager class, otherwise
> that text in the first line of the file will never change. Adding the
> following to your ~/.subversion/config file will ensure that each .java
> file that you add to subversion automatically gets $Id$ expansion
> enabled:
>   [auto-props]
>   *.java = svn:keywords=Id

Tried that. Hopefully done now.

> > (2) Why are the guts of Context accessible to the Action as well? They
> > should only be interesting to the Digester core, not to the
> > application. Why not adding an interface that hides the implentation
> > details? Like
> >
> > public interface Context {
> > Path getCurrentPath();
> > String getMatchPath();
> >
> > ClassLoader getClassLoader();
> > Object getRoot();
> > boolean isEmpty();
> > int getStackSize();
> > Object peek();
> > Object peek(int n);
> >
> > putItem(ItemId id, Object data);
> > Object getItem(ItemId id);
> > }
> >
> > and ContextImpl being what Context is now. Similar thing with Path. If
> > you agree I could do the work.
> 
> Yes, I thnk you're right that Context/ContextImpl would be sensible. In
> addition, there is some stuff curently in Context that really is meant
> for the SAXHandler only, not the Action classes, so that would allow us
> to make that distinction. How about we let Context stabilise a bit
> first, though? I've found it is the class I change most often, and if we
> have interface and implementation for it, then that's two places to
> change things.

OK.
 
> Is Path really complex enough to need interface + implementation
> separation? I tend to think not, but am open to persuasion.

It is not about complexity, but a simple and intuitive interface for
the user who writes an Action. Certainly, push, pop and clear should
never be used by the user. Doing so would even be a hazard, I guess.
So, these setter methods should not be part of the interface.

Do you agree?

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] Additions

Posted by Simon Kitching <sk...@apache.org>.

On Sat, 2005-02-12 at 08:28 +0100, Oliver Zeigermann wrote:
> Hi Simon,
> 
> I have now committed the first proposal for the XMLIO like rule
> manager called SupplementaryRuleManager. I also added a very basic
> test for it. I had to make a minor visibility change in
> DefaultRuleManager that it extends. Path now has match methods.
> 
> All this isn't perfect, especially Javadoc is missing and tests need
> to be extended, but I wanted to make this public for discussion as
> soon as possible.
> 
> So, comments?

One design goal was to avoid using protected members. The problem here
is that using protected for a *member variable* locks us into an
implementation forever (a protected member is part of the "external" api
that users can access). I think it preferable to add a public or
protected method that provides access to the member rather than exposing
the member directly. The method can later get the data in some manner
other than reading from a member if we change the implementation. Note
that having a method that returns a Map isn't a *whole* lot better. I
think what we need is a "copy-helper" type method rather than exposing
the namespace member just so a subclass can copy it. I can't think what
the best solution is at the moment, but I feel there's an elegant (and
probably fairly well known) solution nearby. How does clone() solve this
issue?

Note that I have (I think) been rigorous about avoiding protected
members in the o.a.c.digester2 package, but not so rigorous in the
action subpackage. I think it probably is a good idea to apply this to
the action package too, though.

=== 

I would suggest that the match method be placed somewhere other than on
the Path class. Digester could well provide multiple different matching
implementations in future. I think a static method on some XMLIOUtils
class might be appropriate. The user's hand-written Action can then
choose whether to use that matching algorithm or some other. All that
Path currently does is return a String representing the "canonical form"
of the Path, and that seems clean & focussed to me.

=== 

This bit in SupplementaryRuleManager will cause problems I think:
    public List getMatchingActions(String path) {
        List actionList = super.getMatchingActions(path);
        ...
        if (supplementaryAction != null) {
            actionList.add(supplementaryAction);
        }
The list the RuleManager returns should not be modified; it is actually
an internal data structure. This does point out that DefaultRuleManager
should be a little more defensive, perhaps returning 
  Collections.unmodifiableList(actionList)
which would cause the above add call to fail and throw an exception.
That would, however, cause an object to be created on each
getMatchingActions call, which I would prefer to avoid if possible.
How about this instead?
    public List getMatchingActions(String path) {
        List actionList = super.getMatchingActions(path);
        ...
        if (supplementaryAction != null) {
            newActionList = new ArrayList(actionList);
            newActionList.add(supplementaryAction)
        }

I suppose this could be done instead:
        if (supplementaryAction != null) { 
            if (actionList.contains(supplementaryAction) {
              // nothing to do, we must have added the
              // supplementary action to this list earlier
            } else {
              // permanently modify DefaultRuleManager's 
              // internal data structure; yecch!
              actionList.add(action)
            }
        }
but as the comments indicate, I think that having one class manipulate
another's internal data is not in terribly good taste, nor easy to
maintain.

=== 

As I mentioned earlier, I'm happy to put the "fallbackAction"
functionality into DefaultRuleManager. 

=== 

Why is there only facility for one fallback action, and one
supplementary action? Wouldn't having a List of actions for each be no
more difficult and potentially more useful?

=== 

On a minor SVN note, the SVN property "svn:keywords" needs to be set on
a file in order to enable expansion of $Id$ into the appropriate text.
This can be done manually via:
  svn propset svn:keywords "Id" filename
It takes a commit to update the repository. I suggest this would be nice
to do next time you update the SupplementaryRuleManager class, otherwise
that text in the first line of the file will never change. Adding the
following to your ~/.subversion/config file will ensure that each .java
file that you add to subversion automatically gets $Id$ expansion
enabled:
  [auto-props]
  *.java = svn:keywords=Id

> And, by the, now that I have done some work and I came across some new
> questions. Maybe they are just dumb, but what do you think about this?
> 
> (1) Why aren't relative paths to actions also cached?

The current DefaultRuleManager implementation is straight from the old
RulesBase class, except that absolute paths are expected to start with
"/".

I have plans for a major rework of DefaultRuleManager. In particular, I
would love to support /foo/bar[id='1']  or similar. I plan to have a
look at STX (stx.sourceforge.net) to see if I can steal ideas from there
too.

> (2) Why are the guts of Context accessible to the Action as well? They
> should only be interesting to the Digester core, not to the
> application. Why not adding an interface that hides the implentation
> details? Like
> 
> public interface Context {
> Path getCurrentPath();
> String getMatchPath();
> 
> ClassLoader getClassLoader();
> Object getRoot();
> boolean isEmpty();
> int getStackSize();
> Object peek();
> Object peek(int n);
> 
> putItem(ItemId id, Object data);
> Object getItem(ItemId id);
> }
> 
> and ContextImpl being what Context is now. Similar thing with Path. If
> you agree I could do the work.

Yes, I thnk you're right that Context/ContextImpl would be sensible. In
addition, there is some stuff curently in Context that really is meant
for the SAXHandler only, not the Action classes, so that would allow us
to make that distinction. How about we let Context stabilise a bit
first, though? I've found it is the class I change most often, and if we
have interface and implementation for it, then that's two places to
change things.

Is Path really complex enough to need interface + implementation
separation? I tend to think not, but am open to persuasion.

> 
> (3) Bringing this up again: Wouldn't it be more flexible to allow more
> than a String as a pattern in RuleManager#addRule(String pattern,
> Action action)?

If you can give an example of a pattern that cannot be reasonably
represented as a string, I'll happily back an additional
RuleManager#addRule(Pattern, Action) method.

But xpath, xquery, xslt, etc all use just strings to express which
elements are to be matched. So if it is good enough for the W3C, isn't
it good enough for digester?

Or have I misunderstood?

Regards,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] Additions

Posted by Oliver Zeigermann <ol...@gmail.com>.

Hi Simon,

I have now committed the first proposal for the XMLIO like rule
manager called SupplementaryRuleManager. I also added a very basic
test for it. I had to make a minor visibility change in
DefaultRuleManager that it extends. Path now has match methods.

All this isn't perfect, especially Javadoc is missing and tests need
to be extended, but I wanted to make this public for discussion as
soon as possible.

So, comments?

And, by the, now that I have done some work and I came across some new
questions. Maybe they are just dumb, but what do you think about this?

(1) Why aren't relative paths to actions also cached?

(2) Why are the guts of Context accessible to the Action as well? They
should only be interesting to the Digester core, not to the
application. Why not adding an interface that hides the implentation
details? Like

public interface Context {
Path getCurrentPath();
String getMatchPath();

ClassLoader getClassLoader();
Object getRoot();
boolean isEmpty();
int getStackSize();
Object peek();
Object peek(int n);

putItem(ItemId id, Object data);
Object getItem(ItemId id);
}

and ContextImpl being what Context is now. Similar thing with Path. If
you agree I could do the work.

(3) Bringing this up again: Wouldn't it be more flexible to allow more
than a String as a pattern in RuleManager#addRule(String pattern,
Action action)?

Oliver

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [digester2] Additions

Posted by Simon Kitching <sk...@apache.org>.

Hi Oliver,

On Thu, 2005-02-10 at 19:08 +0100, Oliver Zeigermann wrote:
> Folks,
> 
> as I noticed Simon has done so much work on Digester2, I just wanted
> to be sure that my scheduled additions still are appropriate and
> aligned to the overall design. Here they are:
> 
> (1) XMLIORuleManager: A rule manager that takes an action and
> unconditionally calls it on any event. It's useful to imitate xmlio
> style processing, but not limited to this. One application might be
> logging. As it is more general any proposal for a better name?
> 
> Quoting code sample from Simon:
> 
> > // xmlio-style digester
> > Action myHandler = new AbstractAction() {
> >   public void begin(
> >    Context context,
> >    String namespace, String name,
> >    Attributes attrs) {
> >
> >     String path = context.getMatchPath();
> >     if (path.equals("......")) {
> >         ....
> >     } else {
> >         ....
> >     }
> >   }
> >
> >   public void body(...) {
> >   }
> > }
> >
> > RuleManager xmlioRuleManager = new XMLIORuleManager(myHandler);
> > Digester d  = new Digester();
> > d.setRuleManager(xmlioRuleManager);
> >
> 
> As an option it might be possible to register an action with any rule
> manager that gets called unconditionally - epsecially for logging and
> debugging this can be really helpful. However,  I wasn't quite sure if
> Simon was happy with that.
> 

Adding functionality to DefaultRuleManager that returns a set of default
actions *if no other action matches the current path* is not a problem.
I'm all for it. And this would meet your requirements, yes? So I'm happy
for you to go with this if you agree. And as a bonus, implementation
should be trivial: a dozen lines or so. I'm not sure whether this API
should then be exposed via the RuleManager interface (ie required for
all RuleManagers) or just left on DefaultRuleManager.

Adding functionality that returns a set of actions *in addition to* the
actions that match the current path is trickier. In what order are they
returned? And can we avoid instantiating a new List object each time
getMatchingActions is called? I'm not saying I'm opposed to it, I'm just
not sure right now.

> (2) Next to the body and bodySegment call back methods there might be
> one that gives you the complete body of an element as a string
> composed of all character and markup data. This might be useful when
> you want to verbosely take over formatted mixed content like in XHTML.
> 
> For performance reasons there should be some mechanism that tells
> digester when such content is needed on an element basis. Maybe
> something returned by the begin method or - maybe cleaner - something
> returned by an additional call back directly called after begin like
> boolean requiresComposedBody(String namespace, String name) or
> anything similar. This method could alternatively be part of an
> additional or extended interface.

Yep. I'm in favour of this. I would prefer an eye to be kept on
performance; the "requiresComposedBody" solution doesn't tempt me
greatly as that is a call that must be made on each Action each time it
matches, and 99.999% of those calls will return false. I could live with
it if there is no better alternative. However if an action's begin
method could enable the body text collection and the end method disable
it then wouldn't that have the same effect without additional work when
the feature isn't being used? I agree it's a little less intuitive for
Action writers though...and I haven't thought it through fully.

Note that I am tentatively planning to do some significant rework of the
DefaultRuleManager class. But that certainly won't be for some weeks so
hack away! I guess you'll also be wanting to work on the Path class, so
the custom Action class can more easily test the current path? I'll make
sure I check on this list before altering this class (though I've got no
plans to do so).

Cheers,

Simon

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org