You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Simon Kitching <si...@ecnetwork.co.nz> on 2004/04/01 08:23:44 UTC

RE: [digester] mixed content update

On Wed, 2004-03-31 at 10:55, Edelson, Justin wrote:
> Simon - Sorry I haven't had a chance to respond to your email. I was
> actually more concentrating on answering your question about use-cases,
> but it sounds like I don't need to sell this need as much as I thought I
> did (at least for now).

Well, not to me anyway. 

It would be nice to know what some of the other Digester regulars think
though [well, that really means Robert Donkin and Craig McClanahan].
They've been involved with Digester much longer than myself.


> > If someone (eg Justin) is keen to work on this now, we could
> potentially get it in the next release. Otherwise I suggest this could
> go on the to-do list for post-1.6.
> 
> I don't know if "keen" is the right word, but I'm committed to
> "digesting" mixed content XML for an application now, i.e. I need to
> solve this problem one way or another (or use some other XML ingesting
> mechanism, which I don't want to do for purely selfish reasons). At this
> point, I'm planning on an implementation as a subclass. Whether this
> subclass is accepted and put into the Digester release is dependent upon
> a variety of factors, but my intention is to develop a solution either
> way. I have sign-off on contributing modifications back to Apache, so
> that's not an issue.

That sounds great. Welcome to Digester development!

By the way, I looked at your bugzilla entry, but there seems to have
been some trouble with the attachments. It looks like the "core"
implementation was not attached, just some "auxillary files".

> Just to be clear, is there a timeframe for 1.6?

I am *very* keen to get a release out the door within 4 weeks.
Unfortunately, people keep coming up with really cool features to
include :-). We've been talking about a release for a while now, but
no-one else has suggested a goal date. One potential holdup is that it
would be nice for BeanUtils to release first - though it's not critical.

I'm currently shipping a commercial product with a digester library
built from CVS, and would love to get a release out to fix that - as
well as let people use all the nice features that have been added
recently.


> Let's separate the two issues in my original design - Using a special
> text designator and the specific designator used. To be honest, @text
> was really just a placeholder on my end. The only requirement I have for
> this designator is that it be an illegal XML element name so as to
> ensure that there's no conflict (i.e. if the designator was just text,
> that would pose an issue if you had an element named text). My core
> argument in favor of using a specific designator is that it explicitly
> indicates that the pattern (i.e. /element/@text) uses different
> functionality then the traditional Digester method. This is a pretty
> weak argument, I'll admit. I also feel (and can't prove yet) that this
> method is better performance-wise because the extra iterations over the
> list of rules is over a smaller list.

Well, "@" does have a special meaning in xpath. Digester's patterns
aren't xpath-compliant anyway, but it would be nice to avoid gratuitous
confusion. In xpath, "foo/@bar" means "the value of the bar attribute on
the foo element".

> 
> As indicated above, I'm very willing to try alternate implementations,
> including the interface solution you suggested.

> My only concern about the interface solution (for lack of a better name)
> is when you wrote 'for each rule matched by the last call to
> startElement' - In my original subclassed-version, I had a call to
> super.startElement(), but in order to do what you've described, I think
> you'd need to replicate all the code in Digester.startElement() in the
> subclasses startElement() method. Otherwise, the overridden
> startElement() in the subclass would have to make an extra call to
> Rules.match(). I was originally worried about having to maintain the
> subclass's startElement method to reflect changes in the Digester
> implementation, thus the call to super.startElement(). Is this too
> dogmatic? I'm not looking to rehash the cut-and-paste vs. "eating our
> own dog food" discussion recently seen in the context of [lang].

Attached is a working solution based on my original proposal. It is
integrated into Digester rather than being a subclass. I think the
changes are simple enough, and performance impact low enough, that
integrating this into the mainline is ok. Obviously, other developers
are expected to chip in with their opinion on that....

Note that this patch assumes the one I posted earlier today (to keep a
stack of matched rules) is also accepted.

With this initial patch, there is only trivial increase in memory usage.

Regarding CPU overhead, each time an element starts or ends, it is
necessary to iterate over the list of Rules matched by its parent (which
was saved on the stack). Assuming that each tag matches 3 rules, that
means iterating over a list of 3 rules each time, and doing an
instanceof operation against the rule. 

That overhead would apply even if the user isn't interested in text
segments, but it is really pretty low compared to all the other work
that digester is doing.

But this could be optimised further by having a stack of Boolean objects
which parallels the stack of matched rules. On element start, we iterate
over the list of matched rules anyway, so could simply apply an
instanceof operation to each rule as we go, to determine if this set of
matches contains a rule implementing TextSegmentHandler, and push the
true/false state in parallel with the matched rules. Later, we can skip
the iteration over the list looking for TextSegmentHandler rules if the
corresponding flag tells us there aren't any.

Or Digester could have some kind of "enable text segment processing"
method that needs to be called if the user wants this feature, and if
not enabled then this processing is simply skipped.

Anyway, here's the code for review. All comments welcome.

And by the way, the Digester patch file includes both the mods for this
*and* the mods I posted earlier for implementing a "stack of matches"
which is really separate, and has value on its own. So the changes for
mixed-content support are less than they appear.

Regards,

Simon


Re: possibility of 1.6 digester [WAS Re: [digester] mixed content update]

Posted by robert burrell donkin <ro...@blueyonder.co.uk>.
On 6 Apr 2004, at 21:01, David Graham wrote:

> Robert,
> Why not try to get on the PMC so you can feel ok about managing 
> releases?
> You do a lot of work around here so I find it hard to believe you 
> wouldn't
> be welcome on the PMC :-).

i resigned :)

i don't have any plans to rejoin (at least whilst jakarta has 
sub-projects).

- robert


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: possibility of 1.6 digester [WAS Re: [digester] mixed content update]

Posted by David Graham <gr...@yahoo.com>.
Robert,
Why not try to get on the PMC so you can feel ok about managing releases? 
You do a lot of work around here so I find it hard to believe you wouldn't
be welcome on the PMC :-).

David

--- robert burrell donkin <ro...@blueyonder.co.uk> wrote:
> On 5 Apr 2004, at 00:42, Simon Kitching wrote:
> 
> > On Mon, 2004-04-05 at 08:37, robert burrell donkin wrote:
> >> as far as i'm concerned the major issue with a 1.6 is (and has been 
> >> for
> >> a while) finding a release manager. craig and i have different views 
> >> on
> >> who's eligible for this role. you might prefer craig's view's to mine
> >> in this case.
> >
> > I plan to follow your advice. I would like to find someone on the PMC
> > who is willing to act as the release manager if I volunteer to do all
> > the donkey-work. I don't feel comfortable being the release manager
> > myself as I've never been involved in a jakarta project release
> before.
> > I hope that it won't be too hard to find an appropriate person.
> 
> craig's an apache member (as well as a jakarta pmc member). if he's 
> around and thinks that a new digester release is a good idea then he's 
> better than most people you'll find on general so why not ask him...?
> 
> - robert
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
Yahoo! Small Business $15K Web Design Giveaway 
http://promotions.yahoo.com/design_giveaway/

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


possibility of 1.6 digester [WAS Re: [digester] mixed content update]

Posted by robert burrell donkin <ro...@blueyonder.co.uk>.
On 5 Apr 2004, at 00:42, Simon Kitching wrote:

> On Mon, 2004-04-05 at 08:37, robert burrell donkin wrote:
>> as far as i'm concerned the major issue with a 1.6 is (and has been 
>> for
>> a while) finding a release manager. craig and i have different views 
>> on
>> who's eligible for this role. you might prefer craig's view's to mine
>> in this case.
>
> I plan to follow your advice. I would like to find someone on the PMC
> who is willing to act as the release manager if I volunteer to do all
> the donkey-work. I don't feel comfortable being the release manager
> myself as I've never been involved in a jakarta project release before.
> I hope that it won't be too hard to find an appropriate person.

craig's an apache member (as well as a jakarta pmc member). if he's 
around and thinks that a new digester release is a good idea then he's 
better than most people you'll find on general so why not ask him...?

- robert


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [digester] mixed content update

Posted by Simon Kitching <si...@ecnetwork.co.nz>.
On Mon, 2004-04-05 at 08:37, robert burrell donkin wrote:
> as far as i'm concerned the major issue with a 1.6 is (and has been for 
> a while) finding a release manager. craig and i have different views on 
> who's eligible for this role. you might prefer craig's view's to mine 
> in this case.

I plan to follow your advice. I would like to find someone on the PMC
who is willing to act as the release manager if I volunteer to do all
the donkey-work. I don't feel comfortable being the release manager
myself as I've never been involved in a jakarta project release before.
I hope that it won't be too hard to find an appropriate person.

> it might be wiser not to add mixed content support for 1.6 (i have some 
> possible concerns but i'll need to go through the proposed code). there 
> is no reason, though, why the release plan could not involve taking a 
> 1.6 release branch. this would allow development of mixed content to 
> continue whilst preparing for the 1.6 release.

I look forward to seeing your comments. Certainly if there are concerns
then we can skip this feature for the next release.


I think that the proposed "call method when params available"
functionality should be in the release if possible, though. I believe
that this is necessary for Betwixt to support the important "immutable
object", and it seems a shame to make a release that Betwixt cannot base
itself on. It does concern me that this significant feature may go in so
close to a release. I guess the decision on this depends on feedback
again. I'm nearly ready to release an implementation for review, and
have been extra-careful to build lots of unit tests for both it and
CallMethodRule (you may have seen some new unit tests committed for
CallMethodRule).


I've also been thinking of the possibility of making an official "1.6
beta-1" release, available for download by users. There are so many new
features that the user community aren't aware of, and I would like to
get their feedback/testing. Is there a standard mechanism for making
beta releases of commons code in Maven, making them available for
download from the commons website?

Regards,

Simon



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [digester] mixed content update

Posted by robert burrell donkin <ro...@blueyonder.co.uk>.
On 1 Apr 2004, at 07:23, Simon Kitching wrote:
> On Wed, 2004-03-31 at 10:55, Edelson, Justin wrote:

<snip>

>> Just to be clear, is there a timeframe for 1.6?
>
> I am *very* keen to get a release out the door within 4 weeks.
> Unfortunately, people keep coming up with really cool features to
> include :-). We've been talking about a release for a while now, but
> no-one else has suggested a goal date. One potential holdup is that it
> would be nice for BeanUtils to release first - though it's not 
> critical.
>
> I'm currently shipping a commercial product with a digester library
> built from CVS, and would love to get a release out to fix that - as
> well as let people use all the nice features that have been added
> recently.

as far as i'm concerned the major issue with a 1.6 is (and has been for 
a while) finding a release manager. craig and i have different views on 
who's eligible for this role. you might prefer craig's view's to mine 
in this case.

it might be wiser not to add mixed content support for 1.6 (i have some 
possible concerns but i'll need to go through the proposed code). there 
is no reason, though, why the release plan could not involve taking a 
1.6 release branch. this would allow development of mixed content to 
continue whilst preparing for the 1.6 release.

- robert


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


RE: [digester] mixed content update

Posted by Simon Kitching <si...@ecnetwork.co.nz>.
On Thu, 2004-04-01 at 18:23, Simon Kitching wrote:
> Attached is a working solution based on my original proposal. It is
> integrated into Digester rather than being a subclass. I think the
> changes are simple enough, and performance impact low enough, that
> integrating this into the mainline is ok. Obviously, other developers
> are expected to chip in with their opinion on that....

I should note that this is still *just one candidate solution*. I'm
still interested in seeing your solution if you can post it or attach it
to the bugzilla entry. We can then compare the pros/cons of each
approach.

Cheers,

Simon


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org