You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Berin Loritsch <bl...@apache.org> on 2001/04/05 15:41:20 UTC

SOT: TODO medium priority issue

SOT (Semi-Off-Topic)

In Cocoon 2's TODO list under "medium" prioriy, it meantions that
we should have this great and mighty profiler.  In the past I have
given it allot of thought, because I wanted to implement it.
Unfortunately, due to the SAX model, this feature will add more
complexity than is really worth.  The reason is that every stage
is executed simultaneously.  A Generator's startElement() will
propogate through the different stages and cause the Transformers
and Serializers to act on it before the next SAX event is called.

We can easily approach a rough view of the performance at setup
and page generation levels, but there is nothing that will let
us easily split generation from transformation and serialization.

Consider the following simple sequence diagram:

Generator       Transformer       Transformer       Serializer
    |                |                 |                |
    | start "foo"    |                 |                |
    +--------------->|  start "foo"    |                |
    |                +---------------->|   start "bar"  |
    |   end "foo"    |                 +--------------->|
    +--------------->|    end "foo"    |                |
    |                +---------------->|     end "bar"  |
    |                |                 +--------------->|
    |                |                 |                |

At this approach, you would have to find the difference in
timings.  You find the timing it takes for the "start:foo"
event from the Generator to execute from beginning to end,
and repeat the results for each stage.  The timing for the
second transformer to call and return the "start:foo" event
is the time it takes for the serializer to process.  You
subtract that time from the time for the first transformer
to call and return the "start:foo" event and you have how
long the second transformer took.  You repeat the algorithm
by subtracting the cumulative time for the first transformer
to call and return the "start:foo" event from the total time
for the Generators call.

You have to repeat that algorithm for EACH SAX EVENT, and add
the cumulative results for each stage to find out how long the
relative times are.  This will degrade performance so bad,
Cocoon 2 will seem like it is crawling.

The biggest problem with that is what if one of the
transformation stages removes the SAX event from the pipeline?
You would find out how long the processing spent with each
step, and with each SAX event.  This will help you determine
EXACTLY where in the page the slowdown occurs.  The other
issue is how do we measure general delays between SAX events
for the Generator?  My assumption is that the clock would
have to be continually ticking for the generator and only
subtract out the results for each stage.

The good news is that with the skeleton for the new pipeline
that we now have in CVS, we can implement the algorithm in
a ProfilingEventPipeline object and possibly a
CacheProfilingEventPipeline.  Stream profiling is very easy,
so it is not as major a concern.

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org

Re: SOT: TODO medium priority issue

Posted by Giacomo Pati <gi...@apache.org>.

Berin Loritsch wrote:
> 
> SOT (Semi-Off-Topic)
> 
> In Cocoon 2's TODO list under "medium" prioriy, it meantions that
> we should have this great and mighty profiler.  In the past I have
> given it allot of thought, because I wanted to implement it.
> Unfortunately, due to the SAX model, this feature will add more
> complexity than is really worth.  The reason is that every stage
> is executed simultaneously.  A Generator's startElement() will
> propogate through the different stages and cause the Transformers
> and Serializers to act on it before the next SAX event is called.
> 
> We can easily approach a rough view of the performance at setup
> and page generation levels, but there is nothing that will let
> us easily split generation from transformation and serialization.
> 
> Consider the following simple sequence diagram:
> 
> Generator       Transformer       Transformer       Serializer
>     |                |                 |                |
>     | start "foo"    |                 |                |
>     +--------------->|  start "foo"    |                |
>     |                +---------------->|   start "bar"  |
>     |   end "foo"    |                 +--------------->|
>     +--------------->|    end "foo"    |                |
>     |                +---------------->|     end "bar"  |
>     |                |                 +--------------->|
>     |                |                 |                |
> 
> At this approach, you would have to find the difference in
> timings.  You find the timing it takes for the "start:foo"
> event from the Generator to execute from beginning to end,
> and repeat the results for each stage.  The timing for the
> second transformer to call and return the "start:foo" event
> is the time it takes for the serializer to process.  You
> subtract that time from the time for the first transformer
> to call and return the "start:foo" event and you have how
> long the second transformer took.  You repeat the algorithm
> by subtracting the cumulative time for the first transformer
> to call and return the "start:foo" event from the total time
> for the Generators call.
> 
> You have to repeat that algorithm for EACH SAX EVENT, and add
> the cumulative results for each stage to find out how long the
> relative times are.  This will degrade performance so bad,
> Cocoon 2 will seem like it is crawling.
> 
> The biggest problem with that is what if one of the
> transformation stages removes the SAX event from the pipeline?
> You would find out how long the processing spent with each
> step, and with each SAX event.  This will help you determine
> EXACTLY where in the page the slowdown occurs.  The other
> issue is how do we measure general delays between SAX events
> for the Generator?  My assumption is that the clock would
> have to be continually ticking for the generator and only
> subtract out the results for each stage.
> 
> The good news is that with the skeleton for the new pipeline
> that we now have in CVS, we can implement the algorithm in
> a ProfilingEventPipeline object and possibly a
> CacheProfilingEventPipeline.  Stream profiling is very easy,
> so it is not as major a concern.

Exactly the beauty of the component management in C2 :) Think also of
replacing the NullSAXConnector with a TimingSAXConnector for measurement
together with your ProfilingEventPipeline which only have to collect the
timings from those SAXConnectors after processing the pipeline to report
the statistics. 

A future CachingEventPipeline will probably use different instances of
SAXConnectors for measurement of cache efficiency, for cache "tee"ing
and cache replay.

Giacomo

> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
> For additional commands, email: cocoon-dev-help@xml.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org