You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@forrest.apache.org by Upayavira <uv...@upaya.co.uk> on 2003/08/17 17:41:33 UTC

CLI Caching, etc

Just to keep you up to date with my work on the CLI:

I found a bug which meant that Cocoon wasn't holding onto its cache each 
time it shut down and restarted, which explained why the CLI wasn't 
using its cache. Vadim fixed the bug.

So, the CLI can now read out of the cache correctly. However, it seems 
that the cache is either slightly slower than page generation, or much 
the same, so there's no real benefit of this bug fix, at least in this area.

Also, because pages now come from the cache, pipelines aren't processed, 
and the LinkGatherer component no longer works, so we only have LinkView 
gathering for following links :-(

The one benefit of this is that it is now easy to identify whether a 
page came out of the cache, if it did, to compare the timestamp of the 
file on disc with the timestamp of the cached element, and only save to 
disc if the cached element is newer. So, we haven't yet speeded things 
up, but we have got it to only update changed files.

So, if you are happy with link view (for the moment), and like the idea 
of only updating pages that have changed, then update to CVS Cocoon. 
Otherwise, stick with the one you've got.

I would be interested in your comments upon this mixed set of consequences.

I will try to get linkGathering working again, but it does involve 
digging a bit further than I'm used to.

Regards, Upayavira

Re: CLI Caching, etc

Posted by Upayavira <uv...@upaya.co.uk>.

Jeff Turner wrote:

>On Sun, Aug 17, 2003 at 04:41:33PM +0100, Upayavira wrote:
>  
>
>>Just to keep you up to date with my work on the CLI:
>>    
>>
>
>Appreciated.
>  
>
>>I found a bug which meant that Cocoon wasn't holding onto its cache each 
>>time it shut down and restarted, which explained why the CLI wasn't 
>>using its cache. Vadim fixed the bug.
>>
>>So, the CLI can now read out of the cache correctly. However, it seems 
>>that the cache is either slightly slower than page generation, or much 
>>the same, so there's no real benefit of this bug fix, at least in this area.
>>    
>>
>
>It would be interesting to know how much of the pipeline is actually
>cached.  The times for a first and second Forrest run are 2:36 and 2:39
>respectively, and they're suspiciously similar; as if the cache is being
>checked but not used.  For instance, rendering site.pdf takes 20s on
>first and second rendering.  The timestamps are very useful, btw!
>
 From my stepping through the code so far, I can see that stuff is got 
out of the cache correctly - store.get() returns something each time. So 
I haven't yet worked out why it takes so long. But I will - someday :-(

>>Also, because pages now come from the cache, pipelines aren't processed, 
>>and the LinkGatherer component no longer works, so we only have LinkView 
>>gathering for following links :-(
>>    
>>
>
>Hmm.. tricky.  If the LinkGatherer output is a byproduct of running the
>pipeline, and the pipeline output is cached, then perhaps the
>LinkGatherer output should also be cached?
>
Yes, that's what I'd like. But getting it cached is still a little 
beyond my level, and involves hacking around in places I feel a bit 
uncomfortable. Again, I'll get there though.

>>The one benefit of this is that it is now easy to identify whether a 
>>page came out of the cache, if it did, to compare the timestamp of the 
>>file on disc with the timestamp of the cached element, and only save to 
>>disc if the cached element is newer. So, we haven't yet speeded things 
>>up, but we have got it to only update changed files.
>>    
>>
>
>I don't really understand this.  Surely if site.pdf takes 20s on first
>and second rendering, it's updating an unchanged file?
>
Because you actually have to generate the page (at least get it out of 
the cache) in order to work out whether it has changed. So the time 
taken is the same, but the file is not written. It might be possible to 
just get the timestamp out of the cache, which would be quicker. This 
could benefit the servlet too - pass the last modified date in the 
environment and let the caching pipeline first check whether the page 
has changed before retrieving the whole page. But still - a bit beyond 
me right now.

>>So, if you are happy with link view (for the moment), and like the idea 
>>of only updating pages that have changed, then update to CVS Cocoon. 
>>Otherwise, stick with the one you've got.
>>    
>>
>
>Oh well, link-view lets us filter out unwanted links, even if it's really
>the user-agent's job (you convinced me;), so I'm happy with CVS.
>  
>
Good.

>Oh, mind if I make one change to the output?  Instead of having the time
>on a separate line:
>
>* [0] document-v12.pdf
>         [1.356 seconds]
>* [38] community/howto/index.html
>         [0.524 seconds]
>* [0] community/howto/index.pdf
>         [0.262 seconds]
>* [0] /favicon.ico
>         [0.052 seconds]
>
>
>Have the times right-indented:
>
>
>* [0] document-v12.pdf                  [1.356 seconds]
>* [38] community/howto/index.html       [0.524 seconds]
>* [0] community/howto/index.pdf         [0.262 seconds]
>* [0] /favicon.ico                      [0.052 seconds]
>
>It saves lots of screen bandwidth, and makes the output more parseable.
>
On my first version, that's how I had it (without the justification 
though). But the version in CVS is a System.out.println hack that works 
around the BeanListener code. When I get to improving the bean listener 
code, I'll add a way for the bean to report back, and then the bean 
listener implementation can display it however it likes.

>>I would be interested in your comments upon this mixed set of consequences.
>>
>>I will try to get linkGathering working again, but it does involve 
>>digging a bit further than I'm used to.
>>    
>>
>
>In true open-source fashion, we'll be here in our armchairs cheering you
>on ;)  I saw you commit some CLI refactorings - does that mean the code
>stable enough yet for bystanders to start poking & trying to understand
>things?
>
I would say the code is stable, with the provisos we've discussed. But I 
do have further improvements planned over the next few weeks (relevant 
particularly to you: an Ant task, an <exclude> config element, and the 
ability to use link view without rewriting filenames).

Regards, Upayavira

Re: CLI Caching, etc

Posted by Jeff Turner <je...@apache.org>.

On Sun, Aug 17, 2003 at 04:41:33PM +0100, Upayavira wrote:
> Just to keep you up to date with my work on the CLI:

Appreciated.

> I found a bug which meant that Cocoon wasn't holding onto its cache each 
> time it shut down and restarted, which explained why the CLI wasn't 
> using its cache. Vadim fixed the bug.
> 
> So, the CLI can now read out of the cache correctly. However, it seems 
> that the cache is either slightly slower than page generation, or much 
> the same, so there's no real benefit of this bug fix, at least in this area.

It would be interesting to know how much of the pipeline is actually
cached.  The times for a first and second Forrest run are 2:36 and 2:39
respectively, and they're suspiciously similar; as if the cache is being
checked but not used.  For instance, rendering site.pdf takes 20s on
first and second rendering.  The timestamps are very useful, btw!

> Also, because pages now come from the cache, pipelines aren't processed, 
> and the LinkGatherer component no longer works, so we only have LinkView 
> gathering for following links :-(

Hmm.. tricky.  If the LinkGatherer output is a byproduct of running the
pipeline, and the pipeline output is cached, then perhaps the
LinkGatherer output should also be cached?

> The one benefit of this is that it is now easy to identify whether a 
> page came out of the cache, if it did, to compare the timestamp of the 
> file on disc with the timestamp of the cached element, and only save to 
> disc if the cached element is newer. So, we haven't yet speeded things 
> up, but we have got it to only update changed files.

I don't really understand this.  Surely if site.pdf takes 20s on first
and second rendering, it's updating an unchanged file?

> So, if you are happy with link view (for the moment), and like the idea 
> of only updating pages that have changed, then update to CVS Cocoon. 
> Otherwise, stick with the one you've got.

Oh well, link-view lets us filter out unwanted links, even if it's really
the user-agent's job (you convinced me;), so I'm happy with CVS.

Oh, mind if I make one change to the output?  Instead of having the time
on a separate line:

* [0] document-v12.pdf
         [1.356 seconds]
* [38] community/howto/index.html
         [0.524 seconds]
* [0] community/howto/index.pdf
         [0.262 seconds]
* [0] /favicon.ico
         [0.052 seconds]

Have the times right-indented:

* [0] document-v12.pdf                  [1.356 seconds]
* [38] community/howto/index.html       [0.524 seconds]
* [0] community/howto/index.pdf         [0.262 seconds]
* [0] /favicon.ico                      [0.052 seconds]

It saves lots of screen bandwidth, and makes the output more parseable.

> I would be interested in your comments upon this mixed set of consequences.
> 
> I will try to get linkGathering working again, but it does involve 
> digging a bit further than I'm used to.

In true open-source fashion, we'll be here in our armchairs cheering you
on ;)  I saw you commit some CLI refactorings - does that mean the code
stable enough yet for bystanders to start poking & trying to understand
things?

Cheers,

--Jeff

> Regards, Upayavira
> 
>