You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cocoon.apache.org by Unico Hommes <Un...@hippo.nl> on 2003/11/03 11:29:25 UTC

Improving HTTP protocol handling (Was: RE: Fooling around with cocoon davmap)

> 
> -----Original Message-----
> From: Sylvain Wallez [mailto:sylvain@apache.org] 
> Sent: maandag 3 november 2003 9:52
> To: dev@cocoon.apache.org
> 
> Unico Hommes wrote:
> 
> <snip/>
> 
> >
> >We were talking about the fact that it seemed impossible to 
> serve a request without also sending an entity body along 
> with the response. (Short of suppressing the output in the 
> serializer which is more of hack than a solution). I thought 
> it was allowed to call a flow function and then not send a 
> page. But apparently was wrong. Stefano agreed that it should 
> be legal to call a flow function that does not redirect to a 
> page in order to cover the full range HTTP better.
> >
> >Specifically we were discussing the specification of the 
> OPTIONS method that prescribes that "the response MUST NOT 
> include entity information other than what can be considered 
> as communication options" which seems to exclude sending an 
> entity body from being such a legal response.
> >
> >I traced the above location as the place the code would need 
> to be changed in order to achieve this. But I could be wrong.
> >  
> >
> 
> Sorry to say that, but... yes, I think so ;-)
> 
> IMO, this should be handled at the pipeline level, i.e. on a 
> HEAD request, the pipeline should be built and setup, but not 
> executed. And this for several reasons:
> - not every request is handled by flowscript
> - some pipeline components set response headers, such as the 
> i18n transformer or the browser selector.
> - if we use the pipeline key as the Etag (see below), the 
> pipeline must be built and setup to compute that key.
> 

Good point, we need to do that too, but not having to send a page from
the flow could also help us in other situations where we don't need
access to the pipeline. Think OPTIONS, TRACE, MKCOL, PUT, etc. Or do you
think these should also be handled at the pipeline level?

> Note that this pipeline-level handling is different from 
> fooling the serializer by sending its output to /dev/null, 
> since the processing chain is setup to get all required 
> information, but not executed.
> 
> Actually, this is not very different from what happens today 
> when content is retrieved from the cache (pipeline is built 
> and setup but not executed).
> 

OK. Are you saying then that the pipelines should be handling more low
level HTTP methods? Or do you see some other specialized component
handling this?

> >>BTW, can someone explain me what ETags are about (read that 
> in the http RFC a long time ago, but did not really 
> understood at that time).
> >>    
> >>
> >
> >I just looked. It seems entity tags are used as cache 
> validators, similar to Last-Modified header I guess, i.e. 
> they encode the state of a resource entity so that clients 
> can optimize network calls by sending along headers like 
> If-Match, If-None-Match, If-Range, that are then be checked 
> against the current value of the entity tag on the server. If 
> they match (or not) the method is executed. At least that's 
> what I got out of it.
> >  
> >
> 
> Don't really understand what resource _entity_ means, 

   "entity
      The information transferred as the payload of a request or
      response. An entity consists of metainformation in the form of
      entity-header fields and content in the form of an entity-body, as
      described in section 7."

> but it 
> looks like the pipeline cache key could be used for the ETag. 
> What do you think?
> 

I think so. The spec talks about weak and strong entity tags. I would
say the pipeline cache key qualifies as a weak one. Weak keys only
approximate semantic equivalence whereas strong keys reflect the
verbatim response. Because although the pipeline output may stay the
same it doesn't include information about the values of the response
headers, and because validity object the pipeline gets from the pipeline
components doesn't state the content wouldn't be different if it would
execute the pipeline again, just that it shouldn't execute the pipeline.


> Sylvain
> 
> -- 
> Sylvain Wallez                                  Anyware Technologies
> http://www.apache.org/~sylvain           http://www.anyware-tech.com
> { XML, Java, Cocoon, OpenSource }*{ Training, Consulting, 
> Projects } Orixo, the opensource XML business alliance  -  
> http://www.orixo.com
> 
> 
> 
>

Re: Improving HTTP protocol handling (Was: RE: Fooling around with cocoon davmap)

Posted by Sylvain Wallez <sy...@apache.org>.

Unico Hommes wrote:

<snip/>

>>IMO, this should be handled at the pipeline level, i.e. on a HEAD request, the pipeline should be built and setup, but not executed. And this for several reasons:
>>- not every request is handled by flowscript
>>- some pipeline components set response headers, such as the i18n transformer or the browser selector.
>>- if we use the pipeline key as the Etag (see below), the pipeline must be built and setup to compute that key.
>>    
>>
>
>Good point, we need to do that too, but not having to send a page from the flow could also help us in other situations where we don't need access to the pipeline. Think OPTIONS, TRACE, MKCOL, PUT, etc. Or do you think these should also be handled at the pipeline level?
>  
>

HEAD is a bit special here since it can be considered as a 
"stripped-down" version of GET and as such doesn't require special 
application-level handling.

Other methods need to trigger some application-specific behaviour that 
must handled somehow. But I see your point: some methods don't ask for a 
response body. We currently have no way to express this, as the sitemap 
engine throws a RNFE (and hence a 404) if no pipeline was built.

To express this body-less response, several solutions come to mind:
- have a "null-reader" that allows building a pipeline that sends nothing
- have some new method on environment stating that no body is to be 
produced. But this require a new sitemap statement.
- redirect to a special protocol ("null-body:"?) that indicates a 
body-less answer.

The two first solutions have the drawback of requiring some matching in 
the pipeline just to say that we don't want to generate a response body. 
This is useless (and CPU consuming) if the request handling is done in a 
flowscript.

The third solution (redirect) has the advantage of not adding a new 
sitemap statement and be available at no extra cost from a flowscript 
(or an action). But it sounds a bit hacky.

What do you think?

>>Note that this pipeline-level handling is different from fooling the serializer by sending its output to /dev/null, since the processing chain is setup to get all required information, but not executed.
>>
>>Actually, this is not very different from what happens today when content is retrieved from the cache (pipeline is built and setup but not executed).
>>    
>>
>
>OK. Are you saying then that the pipelines should be handling more low level HTTP methods? Or do you see some other specialized component handling this?
>  
>

Maybe just HEAD (see above).

>>>>BTW, can someone explain me what ETags are about (read that in the http RFC a long time ago, but did not really understood at that time).
>>>>        
>>>>
>>>I just looked. It seems entity tags are used as cache validators, similar to Last-Modified header I guess, i.e. they encode the state of a resource entity so that clients can optimize network calls by sending along headers like If-Match, If-None-Match, If-Range, that are then be checked against the current value of the entity tag on the server. If they match (or not) the method is executed. At least that's what I got out of it.
>>>      
>>>
>>Don't really understand what resource _entity_ means, 
>>    
>>
>
>   "entity
>      The information transferred as the payload of a request or
>      response. An entity consists of metainformation in the form of
>      entity-header fields and content in the form of an entity-body, as
>      described in section 7."
>  
>

Ah, ok. So nothing new, actually ;-)

>>but it looks like the pipeline cache key could be used for the ETag. 
>>What do you think?
>>
>
>I think so. The spec talks about weak and strong entity tags. I would say the pipeline cache key qualifies as a weak one. Weak keys only approximate semantic equivalence whereas strong keys reflect the verbatim response.
>

So strong keys can be e.g. the MD5 signature of the response body?

>Because although the pipeline output may stay the same it doesn't include information about the values of the response headers, and because validity object the pipeline gets from the pipeline components doesn't state the content wouldn't be different if it would execute the pipeline again, just that it shouldn't execute the pipeline.
>  
>

Mmmh... If this isn't true, then we have a serious problem, because the 
pipeline is not executed if the validity is valid. Or did I missed 
something?

Also, the rule for pipeline components should be that entity-header 
related headers (e.g. Vary of browser selector) should be set at 
pipeline setup time while entity-body related headers (e.g. 
content-length) should be set at pipeline execution time.

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com