You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cocoon.apache.org by Upayavira <uv...@upaya.co.uk> on 2003/04/16 19:38:00 UTC

CLI caching, etc (was Re: New error handling)

Vadim,

> >>1. Implement setStatus() in AbstractCommandLineEnvironment 
> >>(implementation is empty right now)
> >>2. Add getStatus() to the AbstractCommandLineEnvironment
> >>3. Test getStatus() in the CLI crawling code.
> >>4. Test how it works and fix the broken link :)

Works a treat! Thanks. Although I had to modify the sitemap to give error codes 
(thanks Jeremy for your recent mail!)

> Not will, but does! This was done long time ago (for http), otherwise
> how you will get 404 in the browser? :)

That's kinda what I meant ;-) 

> >Similarly, based upon comments from Nicola Ken ages ago:
> >
> >>>In the Environment there is
> >>>
> >>>    boolean isResponseModified(long lastModified);
> >>>    void setResponseIsNotModified();
> >>>
> >>>But it's never implemented. In AbstractEnvironment:
> >>>
> >>>    public boolean isResponseModified(long lastModified) {
> >>>        return true; // always modified
> >>>    }
> >>>
> >>>    public void setResponseIsNotModified() {
> >>>        // does nothing
> >>>    }
> >
> >Similarly, the setResponseIsNotModified() will be called on the
> >current environment if a response was read from the cache. At
> >present, this method does nothing.

> Before you go further with this... Look at method isResponseModified()
> in [1].
>  
> What you need to do is to:
> 1. Implement method isResponseModified() for command line environment.
> 2. In the CLI, get the file corresponding to the request URI, and get
> its last modification time. 3. Populate environment with this
> modification time (this will be similar to If-Modified-Since date
> header in http). 4. Call cocoon. It will skip generation if response
> is not modified, and won't even read it from cache.

Very interesting. So Cocoon can tell me if something has been modified. Great. 

However, if the Bean is able to send pages to various locations, it might not be able to 
identify when a page was generated without network traffic (e.g when using FTP). 
This would be unfortunate, as a large site could involve a lot of network traffic, and 
the point of this is to avoid that.

I could store locally (in my own hashed up cache) the last modified date for the page 
and the list of links within the page, each time a page is generated. That way, when I 
am about to generate a page, I can easily get its timestamp. If I find that I don't need 
to generate the page, I can use my locally held list of links to follow.

Does this seem reasonable?

And finally, I have got code working to make the CLI use ModifiableSources rather 
than Destination objects. Do you think I need to support the Destination interface still 
(and deprecate it), or can I just delete it entirely?

Once I've got this going, I'll get on with attempting a VFS ModifiableSource (probably 
once I've had a three week holiday in South Africa!).

Thanks again.

Regards, Upayavira



Re: CLI caching, etc

Posted by Vadim Gritsenko <va...@verizon.net>.
Upayavira wrote:

>>>>Before you go further with this... Look at method
>>>>isResponseModified() in [1].
>>>>
>>>>What you need to do is to:
>>>>1. Implement method isResponseModified() for command line
>>>>environment. 2. In the CLI, get the file corresponding to the
>>>>request URI, and get its last modification time. 3. Populate
>>>>environment with this modification time (this will be similar to
>>>>If-Modified-Since date header in http). 4. Call cocoon. It will skip
>>>>generation if response is not modified, and won't even read it from
>>>>cache.
>>>>   
>>>>
>>>>        
>>>>
>>>Very interesting. So Cocoon can tell me if something has been
>>>modified. Great. 
>>>      
>>>
>
>  
>
>>Yes, and it works in http env.
>>    
>>
>
>I've implemented something around this, with a cache that seems more or less to 
>work. However, when I run org.apache.cocoon.Cocoon.process(), my methods that 
>I've implemented on the AbstractCommandLineEnvironment do not get called (i.e. 
>isResponseModified and setResponseIsNotModified). What do I need to do to get 
>Cocoon to actually call these methods on my environment?
>

/me doing some digging...

There is a reference to the isResponseModified in 
AbstractProcessingPipeline.checkLastModified [1]. From what I see, this 
method is intended to work only for readers. Which is unfortunate. What 
do you think - can we extend pipeline implementation to support this for 
event pipelines too? Sylvain / Carsten, opinion? :-)

(I don't have Cocoon checkout at hands right now; so can't give better 
advice)

Vadim

[1] 
http://cvs.apache.org/viewcvs.cgi/cocoon-2.1/src/java/org/apache/cocoon/components/pipeline/AbstractProcessingPipeline.java?rev=1.1&content-type=text/vnd.viewcvs-markup



Re: CLI caching, etc

Posted by Upayavira <uv...@upaya.co.uk>.
> >>Before you go further with this... Look at method
> >>isResponseModified() in [1].
> >> 
> >>What you need to do is to:
> >>1. Implement method isResponseModified() for command line
> >>environment. 2. In the CLI, get the file corresponding to the
> >>request URI, and get its last modification time. 3. Populate
> >>environment with this modification time (this will be similar to
> >>If-Modified-Since date header in http). 4. Call cocoon. It will skip
> >>generation if response is not modified, and won't even read it from
> >>cache.
> >>    
> >>
> >
> >Very interesting. So Cocoon can tell me if something has been
> >modified. Great. 

> Yes, and it works in http env.

I've implemented something around this, with a cache that seems more or less to 
work. However, when I run org.apache.cocoon.Cocoon.process(), my methods that 
I've implemented on the AbstractCommandLineEnvironment do not get called (i.e. 
isResponseModified and setResponseIsNotModified). What do I need to do to get 
Cocoon to actually call these methods on my environment?

> >Once I've got this going, I'll get on with attempting a VFS
> >ModifiableSource (probably once I've had a three week holiday in
> >South Africa!).

> 3 week... Lucky you.

But it'll be three weeks without Cocoon :-(

Regards, Upayavira

Re: CLI caching, etc (was Re: New error handling)

Posted by Vadim Gritsenko <va...@verizon.net>.
Upayavira wrote:

>Vadim,
>
>  
>
>>>>1. Implement setStatus() in AbstractCommandLineEnvironment 
>>>>(implementation is empty right now)
>>>>2. Add getStatus() to the AbstractCommandLineEnvironment
>>>>3. Test getStatus() in the CLI crawling code.
>>>>4. Test how it works and fix the broken link :)
>>>>        
>>>>
>
>Works a treat! Thanks. Although I had to modify the sitemap to give error codes 
>(thanks Jeremy for your recent mail!)
>  
>

Great.
...

>>Before you go further with this... Look at method isResponseModified()
>>in [1].
>> 
>>What you need to do is to:
>>1. Implement method isResponseModified() for command line environment.
>>2. In the CLI, get the file corresponding to the request URI, and get
>>its last modification time. 3. Populate environment with this
>>modification time (this will be similar to If-Modified-Since date
>>header in http). 4. Call cocoon. It will skip generation if response
>>is not modified, and won't even read it from cache.
>>    
>>
>
>Very interesting. So Cocoon can tell me if something has been modified. Great. 
>

Yes, and it works in http env.


>However, if the Bean is able to send pages to various locations, it might not be able to 
>identify when a page was generated without network traffic (e.g when using FTP).
>

In case of ftp you can retrieve timestamp of the file from remote ftp 
server (which is tricky). You can do "ls -l" and get timestamps for the 
all files in the directory, and save them in the hash.


>This would be unfortunate, as a large site could involve a lot of network traffic, and 
>the point of this is to avoid that.
>
>I could store locally (in my own hashed up cache) the last modified date for the page 
>and the list of links within the page, each time a page is generated. That way, when I 
>am about to generate a page, I can easily get its timestamp. If I find that I don't need 
>to generate the page, I can use my locally held list of links to follow.
>
>Does this seem reasonable?
>

It does not seems unreasonable, so it should be reasonable :)


>And finally, I have got code working to make the CLI use ModifiableSources rather 
>than Destination objects. 
>

Cool


>Do you think I need to support the Destination interface still 
>(and deprecate it), or can I just delete it entirely?
>

No; delete it entirely before it was ever released. No need to support 
never-released stuff.


>Once I've got this going, I'll get on with attempting a VFS ModifiableSource (probably 
>once I've had a three week holiday in South Africa!).
>

3 week... Lucky you.

Vadim


>Thanks again.
>
>Regards, Upayavira
>