You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Reid Pinchback <re...@yahoo.com> on 2004/09/10 19:18:33 UTC

[digester] Are performance improvements wanted?

I just finished a project where I had to do a fair bit
of performance tuning work over the last year.  I was
looking through the current digester source, and even
without torquing the code wierdly or changing class
APIs I've seen places that could probably be made
faster.

1) Would folks be interested in digester performance
fixes?  No point in my wasting time on them if, for
example, some major re-write is underway.

2) What would be the preferred way of submitting them?
 I was thinking of submitting a tweaked class as an
enhancement request with an attached patch and maybe a
unit test that measured both the old and new code. 
People could use the test to try the changes on other
platforms (I'd only be testing on some Win32 sdk
versions, but the fixes I have in mind should either
help or at least do no harm on other platforms).  

How much of a gain people would see in real use of
course would depend on what they were doing; I'm
expecting these fixes to matter more in situations
where digesters would run frequently (e.g. SOAP) and
developers have, where feasible, already dealt with
the obvious (factoring out rule+parser factory+parser
instantiations).

Thanks

     Reid




		
_______________________________
Do you Yahoo!?
Shop for Back-to-School deals on Yahoo! Shopping.
http://shopping.yahoo.com/backtoschool

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [digester] Are performance improvements wanted?

Posted by Reid Pinchback <re...@yahoo.com>.
--- robert burrell donkin
<ro...@blueyonder.co.uk> wrote:
> i wonder whether it would be possible to calibrate
> JVM performance 
> using a series of tests and then use that rating to
> work out what the 
> timings should be on different platforms.

The concept of calibration is tempting, but
I'm not convinced you could readily use JUnitPerf
for it.  JIT activity in the JVM is very sensitive
to the environment of the test.  Imagine trying
to test a memory allocation algorithm using a
test rig that constantly caused virtual memory
page faults.  Not only wouldn't you get very 
realistic results, but they'd be very, very
sensitive to platform-specific differences that
the Java performance testing code couldn't
account for.  I haven't kicked the tires on it
yet, but JDK 1.5 has a new profiling API that
might help with this.






		
__________________________________
Do you Yahoo!?
Yahoo! Mail - 50x more storage than other providers!
http://promotions.yahoo.com/new_mail

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [digester] Are performance improvements wanted?

Posted by robert burrell donkin <ro...@blueyonder.co.uk>.
On 12 Sep 2004, at 15:46, Phil Steitz wrote:

>
>> i've been thinking about the problem of proving performance 
>> improvements by using unit tests for a while now. i'd really like to 
>> be able to be able to create reports about the current performance of 
>> library code. maybe it'd be possible to use some kind of 
>> normalization to eliminate (or at least reduce) platform specific 
>> differences. i'd be interested to hear comments from other folks 
>> about this (or ideally, hear about a tool out there which does this 
>> ;)
>
> I have not personally used it, but JUnitPerf 
> <http://www.clarkware.com/software/JUnitPerf.html> looks like it is 
> designed to measure performance changes in unit tests.  It is 
> BSD-licensed.
>
> The approach used in o.a.c.beanutils.BeanUtilsBenchCase -- creating a 
> separate "microbenchmarks" test case with timing included -- could 
> probably also be applied to [digester] and other commons components.
>
> I have no clue how one would go about eliminating platform-specific 
> differences. Could be the best we can do is make microbenchmark test 
> suites available and set up a place where users can report results on 
> different components for different platforms. The Wiki is a natural 
> place to report things; but does it support forms well enough to 
> organize the results?

i did take a quick look at JUnitPerf a while ago. i haven't been 
through it in detail but though it looks like it would work well in a 
commercial situation running on a central continuous integration box, 
open source development needs something that can run on different 
platforms.

i wonder whether it would be possible to calibrate JVM performance 
using a series of tests and then use that rating to work out what the 
timings should be on different platforms.

- robert


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [digester] Are performance improvements wanted?

Posted by Reid Pinchback <re...@yahoo.com>.
I won't repeat my previous comments re: JUnitPerf,
but they apply here too.  Just looked at the bench
case stuff, looks decent, better for fast tests of
small code fragments.  Whether it is appropriate
or not depends on what you are trying to achieve.
If you want to be able to record measurements
(e.g. in some historical performance file) and
compare against that, the approach is fine.

What I'm a bit more concerned about right now
is to, at more-or-less-the-same-time, compare 
the timings of two pieces of code in the same
environment.  I'd like the test to know if
I've achieved an improvement or not.

On the issue of platform-specific differences,
I agree, that is tough.  The problem with
posting numbers is that systems vary so much
its hard to draw conclusions.  If somebody
claimed to have similar hardware and O/S to
you, if their numbers are the same, higher,
or lower than yours, what does it tell you?
Unfortunately, the data is from an experiment
that is too uncontrolled to help a developer
decide if a proposed code change is likely
to be faster across multiple platforms.

If you are inclined to muse in the direction
of random unpractical thoughts, you could
envision a small reference set of Java code
fragments.  Measure Digester performance in
terms of the reference set.  That performance
number should be platform dependent, while
the actual results on any given platform would 
be finally determined by the raw performance of 
the reference set.  That is essentially the
technique used in a variety of numerical
modeling, estimation, or optimization approaches.

Definitely pie-in-the-sky category solution.
Maybe put it on the Wiki for, oh, Digester 27.0.

:-)



--- Phil Steitz <ph...@steitz.com> wrote:

> The approach used in
> o.a.c.beanutils.BeanUtilsBenchCase -- creating a 
> separate "microbenchmarks" test case with timing
> included -- could 
> probably also be applied to [digester] and other
> commons components.


> 
> I have no clue how one would go about eliminating
> platform-specific 
> differences.



		
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [digester] Are performance improvements wanted?

Posted by Phil Steitz <ph...@steitz.com>.
> 
> 
> i've been thinking about the problem of proving performance improvements 
> by using unit tests for a while now. i'd really like to be able to be 
> able to create reports about the current performance of library code. 
> maybe it'd be possible to use some kind of normalization to eliminate 
> (or at least reduce) platform specific differences. i'd be interested to 
> hear comments from other folks about this (or ideally, hear about a tool 
> out there which does this ;)

I have not personally used it, but JUnitPerf 
<http://www.clarkware.com/software/JUnitPerf.html> looks like it is 
designed to measure performance changes in unit tests.  It is BSD-licensed.

The approach used in o.a.c.beanutils.BeanUtilsBenchCase -- creating a 
separate "microbenchmarks" test case with timing included -- could 
probably also be applied to [digester] and other commons components.

I have no clue how one would go about eliminating platform-specific 
differences. Could be the best we can do is make microbenchmark test 
suites available and set up a place where users can report results on 
different components for different platforms. The Wiki is a natural place 
to report things; but does it support forms well enough to organize the 
results?

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [digester] Are performance improvements wanted?

Posted by Reid Pinchback <re...@yahoo.com>.
--- Simon Kitching <si...@ecnetwork.co.nz> wrote:

> You should be warned, though, that the logging area
> is particularly tricky. 

Yup, I figured that could be the case.  Before
I even proposed this I'd already decided that I'd
just float each change as a proposal, and just grin
and bear it if there was something that made the
change unwise.  While you strive to create performance
fixes that don't change behaviour at all, sometimes
you run into cases were that isn't true.  When
that happens, folks have to decide if the change
would be to something that mattered, or not.

>From what I remember, there is a requirement
> that frameworks
> which use digester (eg j2ee app servers) must be
> able to direct logging
> output to different destinations depending on which
> "app" the framework
> is running the digester on behalf of.
...
> I was not able to find a
> better way to organise logging while satisfying the
> original
> requirements.
> 
> I'm not saying there *isn't* a way to improve
> digester logging, just
> that it is probably necessary to read that email
> thread first to be sure
> the "improvements" still satisfy the requirements as
> described by Craig.

Ok, I'll see if I can find anything archived about
that.  At a guess I bet its something like the
following:

- getLogger returns a reference to a logger
- Digester instances currently each have their
  own reference
- if you use that reference to change the logger
  behaviour for your Digester, do you change only
  your own logging, or everybody else's logging
  via the Digester/Digester.sax categories, and
  would sharing a static logger change that?

Can't say I've traced this kind of thing through
log4j, but I'd have expected that changing the 
logger changed everybody's logging via the same
category against the same repository.  Could be I'm
wrong.  Normally I'd expect that if multiple clients
needed different control of logging for the same
category, they'd need to have their own repositories.

In any case, I'm not overly worried about "winning"
on this particular change.  Its the kind of thing
that matters more during development than during
execution - its a measurable drag on running unit
tests that instantiate Digester instances in loops,
but not such a big deal in real-life Digester usage.

Not an issue for now, but for the future I'm
particularly intrigued by some of the Wiki
comments for Digester 2.0, and how it might be
time to split out various areas of functionality.
I think at that point you might have a chance
to allow for some very serious performance
improvements in areas that wouldn't be possible
today without changing the API in undesirable ways.
I think a lot of the circular dependencies between
classes and packages that exist in Digester today 
are the initial sniff test of interesting
opportunities with a different approach.

   Reid



	
		
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [digester] Are performance improvements wanted?

Posted by Simon Kitching <si...@ecnetwork.co.nz>.
On Mon, 2004-09-13 at 08:38, Reid Pinchback wrote:

> The first performance-related patch I'll submit 
> shows how I approximate this.  Mostly I try to
> minimize how much JIT, GC, and differences in
> inheritance hierarchy depth can distort the
> comparison.  The case I've put together is on 
> what the impact would be of handling logger
> initialization statically in the Digester class.
> Not a big win, obviously, but an easy example of 
> the approach.  Besides cutting constructor cost 
> in 1/2 is never bad.

Hi Reid,

I'm also interested in seeing performance patches. It's great to hear
you're working on this topic.

You should be warned, though, that the logging area is particularly
tricky. From what I remember, there is a requirement that frameworks
which use digester (eg j2ee app servers) must be able to direct logging
output to different destinations depending on which "app" the framework
is running the digester on behalf of.

There's some email discussion about logging in digester from about a
year back that goes into this in some depth; I was not happy with the
way logging worked in Digester but after Craig explained why it was the
way it was, and what the requirements were, I was not able to find a
better way to organise logging while satisfying the original
requirements.

I'm not saying there *isn't* a way to improve digester logging, just
that it is probably necessary to read that email thread first to be sure
the "improvements" still satisfy the requirements as described by Craig.

[of course these requirements should really be coded as unit tests so
that required behaviour *can't* be changed without unit test
failures....]

I'm certain, however, that there are a number of other places where
optimisations are available, and look forward to seeing some
improvements.

Regards,

Simon



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [digester] Are performance improvements wanted?

Posted by Reid Pinchback <re...@yahoo.com>.

--- robert burrell donkin
<ro...@blueyonder.co.uk> wrote:

> IMHO digester 
> 1 is approaching feature completeness (at least,
> given the limits of 
> backwards compatibility) and should be continued to
> maintained as a 
> mature, stable, well tested library. looking at
> performance issues now 
> seems appropriate

Ok, I'll see what I can come up with.   First I'm
trying to finish off one of the Digester 1.7 issues 
listed on the Wiki (can Digester deal with Ant
properties?).  I hope to have that finished today.

The nature of performance improvement investigations
tends to be "I won't know if its faster until I know
if its faster".  The typical, easy approach is to 
find bits of calculation that can be migrated from
more-frequently-executed areas to something less
frequent.  Another, sometimes tougher approach
is to find ways of eliminating freqently-invoked
runtime casts.  You can also get amazing changes
by rewriting 'for' loops, but that tends to be an
all-or-nothing thing; 10x better, or no change
at all.  I'm also inclined to see how much
rule match processing factors into Digester
performance.  In any case, I'll follow your advice
and submit smaller changes if I find anything.


> i've been thinking about the problem of proving
> performance 
> improvements by using unit tests for a while now.
> i'd really like to be 
> able to be able to create reports about the current
> performance of 
> library code. maybe it'd be possible to use some
> kind of normalization 
> to eliminate (or at least reduce) platform specific
> differences.

The first performance-related patch I'll submit 
shows how I approximate this.  Mostly I try to
minimize how much JIT, GC, and differences in
inheritance hierarchy depth can distort the
comparison.  The case I've put together is on 
what the impact would be of handling logger
initialization statically in the Digester class.
Not a big win, obviously, but an easy example of 
the approach.  Besides cutting constructor cost 
in 1/2 is never bad.

Somebody suggested JPerfUnit; I've used it, but
haven't been all that impressed by it.  I think
its decent for monitoring performance on large
connected hunks of code, but not really any
good at helping you investigate how to speed
code up.  Invoking all the surrounding unit test
boilerplate just distorts the results too much.
While you hope to find 1 or 2 places in the
code that account for a large hunk of performance,
often you instead end up finding 20 or 30 places
that, in total, get you the desired change.
JPerfUnit is too coarse for that kind of work.

More effective is to just go and grab the free 
copy of JProbe, and use it profile a suite of
unit tests to get some ideas.  Then you create
some carefully-designed unit tests that let you
compare before-and-after timing results.  The
unit tests don't have to depend on a commercially
produced tool, I just use it for an initial
"sniff test" because it lets me prune out
irrelevant CPU loads (like time spent on
unit test boilerplate).





		
__________________________________
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
http://promotions.yahoo.com/new_mail 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [digester] Are performance improvements wanted?

Posted by robert burrell donkin <ro...@blueyonder.co.uk>.
On 10 Sep 2004, at 18:18, Reid Pinchback wrote:

> I just finished a project where I had to do a fair bit
> of performance tuning work over the last year.  I was
> looking through the current digester source, and even
> without torquing the code wierdly or changing class
> APIs I've seen places that could probably be made
> faster.
>
> 1) Would folks be interested in digester performance
> fixes?  No point in my wasting time on them if, for
> example, some major re-write is underway.

though there's probably going to be a radical rewriting one day 
(digester2), i (for one) will be willing to review and apply patches to 
the digester one code stream for the foreseeable future. IMHO digester 
1 is approaching feature completeness (at least, given the limits of 
backwards compatibility) and should be continued to maintained as a 
mature, stable, well tested library. looking at performance issues now 
seems appropriate (though it's not a particular itch of mine and i'm 
not likely to spearhead any comprehensive effort).

> 2) What would be the preferred way of submitting them?
>  I was thinking of submitting a tweaked class as an
> enhancement request with an attached patch and maybe a
> unit test that measured both the old and new code.
> People could use the test to try the changes on other
> platforms (I'd only be testing on some Win32 sdk
> versions, but the fixes I have in mind should either
> help or at least do no harm on other platforms).

i've been thinking about the problem of proving performance 
improvements by using unit tests for a while now. i'd really like to be 
able to be able to create reports about the current performance of 
library code. maybe it'd be possible to use some kind of normalization 
to eliminate (or at least reduce) platform specific differences. i'd be 
interested to hear comments from other folks about this (or ideally, 
hear about a tool out there which does this ;)

so, even if no tool exists (at the moment), it'd be great to have unit 
tests that demonstrate the performance improvement. that way, once a 
tool exists, we can just plug it straight in.

in terms of submitting patches, if you haven't take a look already, 
read the standard stuff on submitting patches on the web site and 
attach them to bugzilla enhancements. (IIRC the lists now strip most 
attachments to limit stress caused by viruses.) you might like to post 
an email to the list explaining the changes and linking to the request 
(bugzilla messages often slip through my filters). it's better to 
create many small requests (one per improvement) rather than one large 
one. it's hard to verify large patches and so they tend to get pushed 
down the priority list.

> How much of a gain people would see in real use of
> course would depend on what they were doing; I'm
> expecting these fixes to matter more in situations
> where digesters would run frequently (e.g. SOAP) and
> developers have, where feasible, already dealt with
> the obvious (factoring out rule+parser factory+parser
> instantiations).

i think that it'd be an excellent idea to collate the collective 
community knowledge about real life digester performance. the wiki 
(http://wiki.apache.org/jakarta-commons) seems like the right place for 
something like this. it'd be really great if you could pull something 
together on this.

- robert


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org