You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Graham Leggett <mi...@sharp.fm> on 2002/05/05 20:03:24 UTC

Stripping Content-Length headers

Hi all,

A question was raised in Bugzilla about proxy stripping the
Content-Length header from the backend response.

Is this really necessary?

I understand the Content Length filter is responsible for sorting out
Content-Length, and that chunked encoding will be enabled should the
length be uncalculate-able, so it works as it is - but the question is,
if we already have a content-length, should we not just keep it?

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: Stripping Content-Length headers

Posted by Joshua Slive <jo...@slive.ca>.
Graham Leggett wrote:
> Joshua Slive wrote:
>>Just to start, what about HTTP/1.0 clients?
> They would just get no content-length. It works, but it's not ideal it
> would seem.

Yep.  You lose 1.0-keepalives and progress indicators in clients, plus 
you probably screw up client-side caching.

> 
>>Even for HTTP/1.1 clients,
>>it seems chunked encoding should only be used when necessary.  Chunked
>>encoding is extra overhead, and removes information that may be valuable
>>down-the-line.  We don't send chunked encoding for ordinary static
>>content when we are the origin server, do we?
> 
> If we already have a content-length, surely we should take advantage of
> it if we can? In theory only filters that change content length should
> touch the content length.

Right.  That would be my opinion as well.

Joshua.




Re: Stripping Content-Length headers

Posted by Graham Leggett <mi...@sharp.fm>.
Joshua Slive wrote:

> Just to start, what about HTTP/1.0 clients?

They would just get no content-length. It works, but it's not ideal it
would seem.

> Even for HTTP/1.1 clients,
> it seems chunked encoding should only be used when necessary.  Chunked
> encoding is extra overhead, and removes information that may be valuable
> down-the-line.  We don't send chunked encoding for ordinary static
> content when we are the origin server, do we?

If we already have a content-length, surely we should take advantage of
it if we can? In theory only filters that change content length should
touch the content length.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: Stripping Content-Length headers

Posted by Joshua Slive <jo...@slive.ca>.
Justin Erenkrantz wrote:
> IMHO, proxy is especially good for chunked encoding, because we read
> the proxy and if we use chunks on the output, we can return the data
> sooner than waiting for the entire response to be read before
> writing anything to the network.  -- justin

Sure, but a content-length header and unmodified content provides the 
same thing.

Chunked encoding is great when the alternative is no persistent 
connections, or queuing up the whole request to get the size.  But we 
should be using the content length when we have it available.  I really 
don't know why the proxy is stripping it in the first place.  It should 
only be stripped if the content is changed by the proxy or one of the 
filters.

Anyway, I have no idea how to fix this, so I'll just shut up.

Joshua.


Re: Stripping Content-Length headers

Posted by Justin Erenkrantz <je...@apache.org>.
On Sun, May 05, 2002 at 02:56:02PM -0400, Joshua Slive wrote:
> Just to start, what about HTTP/1.0 clients?   Even for HTTP/1.1 clients, 
> it seems chunked encoding should only be used when necessary.  Chunked 
> encoding is extra overhead, and removes information that may be valuable 
> down-the-line.  We don't send chunked encoding for ordinary static 
> content when we are the origin server, do we?

Nope, we let our normal logic takeover - we don't do anything
special.  If our client can't handle it (i.e. 1.0 client), then
the server adjusts accordingly.  It will be treated just like
any other served page (as far as our output filters are
concerned).  Whatever logic we use to handle chunking or
keepalives or whatever are just reused with proxy as with
any other page we serve.  

There is logic in the output filters for determining when to do
chunked encoding.  This is entirely separate from the proxy code.
In fact, httpd-2.0 is much more prone to using chunked encoding
than 1.3 ever was.  What decision the proxy server made for
C-L or encoding should have no impact on what we do for our client
(imagine using mod_deflate in conjunction with proxy).

IMHO, proxy is especially good for chunked encoding, because we read
the proxy and if we use chunks on the output, we can return the data
sooner than waiting for the entire response to be read before
writing anything to the network.  -- justin

Re: Stripping Content-Length headers

Posted by Joshua Slive <jo...@slive.ca>.
Justin Erenkrantz wrote:
> On Sun, May 05, 2002 at 08:03:24PM +0200, Graham Leggett wrote:
> 
>>I understand the Content Length filter is responsible for sorting out
>>Content-Length, and that chunked encoding will be enabled should the
>>length be uncalculate-able, so it works as it is - but the question is,
>>if we already have a content-length, should we not just keep it?
> 
> 
> Nah, this allows us flexibility in optimizing the data sent to our
> client.  If we can send chunked-encoding, I believe that is a better
> than using C-L.  I believe that the RFC allows us to do these sorts
> of optimizations.  
> 
> IIRC, the PR wasn't saying there was a problem with our approach - it
> was just that the admin didn't understand that was legal.  -- justin

Woh... Take this with the usual "I'm no expert in this" preface, but 
this doesn't seem right.

Just to start, what about HTTP/1.0 clients?   Even for HTTP/1.1 clients, 
it seems chunked encoding should only be used when necessary.  Chunked 
encoding is extra overhead, and removes information that may be valuable 
down-the-line.  We don't send chunked encoding for ordinary static 
content when we are the origin server, do we?

Joshua.


Re: Stripping Content-Length headers

Posted by Justin Erenkrantz <je...@apache.org>.
On Sun, May 05, 2002 at 01:39:33PM -0700, Ryan Bloom wrote:
> That is a long-standing bug that most of us have argued against now.  In
> the past, we said that the C-L had to be removed, because a filter could
> have modified the length.  Now, most of us have said that this is just
> plain wrong, and if a filter wants to modify the length, then they
> should remove the C-L for us.

Ah, okay.  So, this answers my question I just asked - if a C-L
entry exists in r->headers_out, ap_content_length_filter should
just get out of the way, right?  -- justin

Re: Stripping Content-Length headers

Posted by Justin Erenkrantz <je...@apache.org>.
On Sun, May 05, 2002 at 08:09:47PM -0400, rbb@apache.org wrote:
> Take a closer look at the filter.  We almost never actually buffer data in
> the C-L filter.

I'm specifically looking at the case where we have an HTTP/1.0
connection that is kept-alive (partial_send_okay == 0).  In that
case, it is buffered in the C-L filter (stored in ctx->saved).
Or, am I missing something?  

My point is that we could optimize the case where we already
know the C-L.  -- justin

Re: Stripping Content-Length headers

Posted by rb...@apache.org.
On Sun, 5 May 2002, Justin Erenkrantz wrote:

> On Sun, May 05, 2002 at 06:26:53PM -0400, rbb@apache.org wrote:
> > The filter should remain regardless of whether there is a C-L, because we
> > use that filter to determine exactly how much data was sent through the
> > filter stack.  That is useful information, and it doesn't hurt performance
> > in any meaningful way (We are just looping through the brigade).  However,
> > the filter should be smart enought to leave the C-L alone, unless it is
> > required.
> 
> Okay, I guess.  I do think it'd cause a performance hit though.
> But, I won't push it.  I don't see the need for it though.
> 
> However, I'm more concerned that we'll be buffering data on
> HTTP/1.0 requests that already have a C-L.  This would be
> enormously expensive when we could have streamed the data
> to the network rather than holding on to it until EOS is
> seen.  -- justin

Take a closer look at the filter.  We almost never actually buffer data in
the C-L filter.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
550 Jean St
Oakland CA 94610
-------------------------------------------------------------------------------


Re: Stripping Content-Length headers

Posted by Justin Erenkrantz <je...@apache.org>.
On Sun, May 05, 2002 at 06:26:53PM -0400, rbb@apache.org wrote:
> The filter should remain regardless of whether there is a C-L, because we
> use that filter to determine exactly how much data was sent through the
> filter stack.  That is useful information, and it doesn't hurt performance
> in any meaningful way (We are just looping through the brigade).  However,
> the filter should be smart enought to leave the C-L alone, unless it is
> required.

Okay, I guess.  I do think it'd cause a performance hit though.
But, I won't push it.  I don't see the need for it though.

However, I'm more concerned that we'll be buffering data on
HTTP/1.0 requests that already have a C-L.  This would be
enormously expensive when we could have streamed the data
to the network rather than holding on to it until EOS is
seen.  -- justin

Re: Stripping Content-Length headers

Posted by Graham Leggett <mi...@sharp.fm>.
Justin Erenkrantz wrote:

> Here's my question: why is C-L filter even getting involved if there
> is a C-L header?  Why shouldn't it just get out of the way if there
> is a C-L already present?  Why do the duplication?  Should we assume
> that any module that changes the content is smart enough to unset it?
> Perhaps we can't do that - a module might change the content but not
> unset C-L?  -- justin

I would argue that a module that changes the content-length, but does
not remove the CL header is broken.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: Stripping Content-Length headers

Posted by rb...@apache.org.
> Here's my question: why is C-L filter even getting involved if there
> is a C-L header?  Why shouldn't it just get out of the way if there
> is a C-L already present?  Why do the duplication?  Should we assume
> that any module that changes the content is smart enough to unset it?
> Perhaps we can't do that - a module might change the content but not
> unset C-L?  -- justin

The filter should remain regardless of whether there is a C-L, because we
use that filter to determine exactly how much data was sent through the
filter stack.  That is useful information, and it doesn't hurt performance
in any meaningful way (We are just looping through the brigade).  However,
the filter should be smart enought to leave the C-L alone, unless it is
required.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
550 Jean St
Oakland CA 94610
-------------------------------------------------------------------------------


Re: Stripping Content-Length headers

Posted by Justin Erenkrantz <je...@apache.org>.
On Sun, May 05, 2002 at 01:17:28PM -0700, Justin Erenkrantz wrote:
> On Sun, May 05, 2002 at 12:54:37PM -0700, Roy T. Fielding wrote:
> > It is legal, but not advisable.  It is not better to use chunking than
> > it is to use C-L.  C-L works better with HTTP/1.0 downstream and allows
> > progress bars to exist on big downloads.  Any filter that does not
> > transform the content should not modify the C-L.
> 
> The C-L filter has logic that if we are ever passed more than
> 32000 bytes (4 * AP_MIN_BYTES_TO_WRITE) and we're HTTP/1.1
> downstream, we use chunked encoding.  Otherwise, we'll use
> C-L.  Or, if a flush bucket is ever sent, we'll use chunked.
> This is the way the entire server operates (regardless of
> proxy).  -- justin

Actually, to correct myself, that isn't exactly true.  It has to do
with the convoluted conditional in ap_set_keepalive() mixed with the
C-L filter.  The partial send in the C-L filter doesn't directly
imply that we'll use chunked, but we'll use chunked if there was no
C-L header set (but the C-L filter won't be setting it when doing a
partial send - it had to be there before the filter started).  This is
due to the ap_set_keepalive() r->chunked = 1 side-effect.

Here's my question: why is C-L filter even getting involved if there
is a C-L header?  Why shouldn't it just get out of the way if there
is a C-L already present?  Why do the duplication?  Should we assume
that any module that changes the content is smart enough to unset it?
Perhaps we can't do that - a module might change the content but not
unset C-L?  -- justin

RE: Stripping Content-Length headers

Posted by Ryan Bloom <rb...@covalent.net>.
> From: Justin Erenkrantz [mailto:jerenkrantz@apache.org]
> 
> On Sun, May 05, 2002 at 12:54:37PM -0700, Roy T. Fielding wrote:
> > It is legal, but not advisable.  It is not better to use chunking
than
> > it is to use C-L.  C-L works better with HTTP/1.0 downstream and
allows
> > progress bars to exist on big downloads.  Any filter that does not
> > transform the content should not modify the C-L.
> 
> The C-L filter has logic that if we are ever passed more than
> 32000 bytes (4 * AP_MIN_BYTES_TO_WRITE) and we're HTTP/1.1
> downstream, we use chunked encoding.  Otherwise, we'll use
> C-L.  Or, if a flush bucket is ever sent, we'll use chunked.
> This is the way the entire server operates (regardless of
> proxy).  -- Justin

That is a long-standing bug that most of us have argued against now.  In
the past, we said that the C-L had to be removed, because a filter could
have modified the length.  Now, most of us have said that this is just
plain wrong, and if a filter wants to modify the length, then they
should remove the C-L for us.

Ryan



Re: Stripping Content-Length headers

Posted by Justin Erenkrantz <je...@apache.org>.
On Sun, May 05, 2002 at 12:54:37PM -0700, Roy T. Fielding wrote:
> It is legal, but not advisable.  It is not better to use chunking than
> it is to use C-L.  C-L works better with HTTP/1.0 downstream and allows
> progress bars to exist on big downloads.  Any filter that does not
> transform the content should not modify the C-L.

The C-L filter has logic that if we are ever passed more than
32000 bytes (4 * AP_MIN_BYTES_TO_WRITE) and we're HTTP/1.1
downstream, we use chunked encoding.  Otherwise, we'll use
C-L.  Or, if a flush bucket is ever sent, we'll use chunked.
This is the way the entire server operates (regardless of
proxy).  -- justin

Re: Stripping Content-Length headers

Posted by Graham Leggett <mi...@sharp.fm>.
Joshua Slive wrote:

> As far as I can tell, the C-L is still trashed.  Doug just removed a
> "flush".  See
>         /* In order for ap_set_keepalive to work properly, we can NOT
>          * have any length information stored in the output headers.
>          */
>         apr_table_unset(r->headers_out,"Transfer-Encoding");
>         apr_table_unset(r->headers_out,"Content-Length");
> 
> in ap_proxy_http_process_response in proxy_http.c.

In the case of the "transfer-encoding", surely this header should be
removed by the dechunk filter? (or whatever bit of the core was doing
dechunking, as the dechunk filter was absorbed into another filter).

In other words, proxy shouldn't be fiddling with headers, rather the
filters should be fiddling with headers instead.

This looks wrong to me.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight..."

Re: Stripping Content-Length headers

Posted by Joshua Slive <jo...@slive.ca>.
Ryan Bloom wrote:

> To the best of my knowledge, Doug actually fixed this problem after the
> 2.0.35 release.  We should keep the C-L unless we are modifying the
> content.  To throw away the C-L means that we are removing information
> that can be used to help determine if the content has changed.

(Woops, I said I was going to shut up.)

As far as I can tell, the C-L is still trashed.  Doug just removed a 
"flush".  See
        /* In order for ap_set_keepalive to work properly, we can NOT
         * have any length information stored in the output headers.
         */
        apr_table_unset(r->headers_out,"Transfer-Encoding");
        apr_table_unset(r->headers_out,"Content-Length");

in ap_proxy_http_process_response in proxy_http.c.

Joshua.




RE: Stripping Content-Length headers

Posted by Ryan Bloom <rb...@covalent.net>.
> From: Justin Erenkrantz [mailto:jerenkrantz@apache.org]
> 
> On Sun, May 05, 2002 at 08:03:24PM +0200, Graham Leggett wrote:
> > I understand the Content Length filter is responsible for sorting
out
> > Content-Length, and that chunked encoding will be enabled should the
> > length be uncalculate-able, so it works as it is - but the question
is,
> > if we already have a content-length, should we not just keep it?
> 
> Nah, this allows us flexibility in optimizing the data sent to our
> client.  If we can send chunked-encoding, I believe that is a better
> than using C-L.  I believe that the RFC allows us to do these sorts
> of optimizations.
> 
> IIRC, the PR wasn't saying there was a problem with our approach - it
> was just that the admin didn't understand that was legal.  -- Justin

To the best of my knowledge, Doug actually fixed this problem after the
2.0.35 release.  We should keep the C-L unless we are modifying the
content.  To throw away the C-L means that we are removing information
that can be used to help determine if the content has changed.

Ryan



Re: Stripping Content-Length headers

Posted by "Roy T. Fielding" <fi...@apache.org>.
On Sunday, May 5, 2002, at 11:25  AM, Justin Erenkrantz wrote:

> On Sun, May 05, 2002 at 08:03:24PM +0200, Graham Leggett wrote:
>> I understand the Content Length filter is responsible for sorting out
>> Content-Length, and that chunked encoding will be enabled should the
>> length be uncalculate-able, so it works as it is - but the question is,
>> if we already have a content-length, should we not just keep it?
>
> Nah, this allows us flexibility in optimizing the data sent to our
> client.  If we can send chunked-encoding, I believe that is a better
> than using C-L.  I believe that the RFC allows us to do these sorts
> of optimizations.
>
> IIRC, the PR wasn't saying there was a problem with our approach - it
> was just that the admin didn't understand that was legal.  -- justin

It is legal, but not advisable.  It is not better to use chunking than
it is to use C-L.  C-L works better with HTTP/1.0 downstream and allows
progress bars to exist on big downloads.  Any filter that does not
transform the content should not modify the C-L.

....Roy


Re: Stripping Content-Length headers

Posted by Justin Erenkrantz <je...@apache.org>.
On Sun, May 05, 2002 at 08:03:24PM +0200, Graham Leggett wrote:
> I understand the Content Length filter is responsible for sorting out
> Content-Length, and that chunked encoding will be enabled should the
> length be uncalculate-able, so it works as it is - but the question is,
> if we already have a content-length, should we not just keep it?

Nah, this allows us flexibility in optimizing the data sent to our
client.  If we can send chunked-encoding, I believe that is a better
than using C-L.  I believe that the RFC allows us to do these sorts
of optimizations.  

IIRC, the PR wasn't saying there was a problem with our approach - it
was just that the admin didn't understand that was legal.  -- justin