You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Akins <ba...@web.turner.com> on 2005/12/06 15:29:42 UTC

2.2 mod_http_proxy and "partial" pages

I have a serious issue.  It seems that if something happens during a 
proxy request after mod_http_proxy starts reading from the backend 
server, no error is reported. (IE, see what happens when ap_pass_brigade 
  returns non success).  The problem is that this "partial page" may be 
cached because r->status is a cacheable code, but the response is 
actually "broken."  The result is that the broken version may be cached 
and served again and again.

Should we break RFC (maybe) and change r->status to 503 (for example) so 
cache can check letter.

Note: this is using our internal mod_cache, but looks like the issue 
would be the same in stock mod_cache.


-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <ba...@web.turner.com>.
Brian Akins wrote:

>  From the best I can tell, the issue is in the proxy code.  When a 
> response gets "truncated" for whatever reason, it doesn't pass an error 
> along, so the filters never know that "something bad" happened.
> 
> 

 From mod_proxy_http.c

in the function ap_proxy_http_process_response:

    /* try send what we read */
                     if (ap_pass_brigade(r->output_filters, bb) != 
APR_SUCCESS
                         || c->aborted) {
                         /* Ack! Phbtt! Die! User aborted! */
                         backend->close = 1;  /* this causes socket 
close below */
                         finish = TRUE;
                     }
 



It seems this will allow a "partial" response to be passed down the 
filter chain with nothing noting that it was an error condition.  This 
exact situation seems to be what I am seeing.


Also, we are still using AP_SERVER_BASEVERSION rather than 
ap_get_server_version() for Via.


-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <ba...@web.turner.com>.
Jim Jagielski wrote:

> Hmmm. I haven't taken a look yet, but is seems to me that
> only complete responses should be cached, not partial, and
> as such we need some better mechanism in place for that
> "Not Cache-able", "Could be Cache-able" and "To-Be-Cached"
> state tree.
> 

 From the best I can tell, the issue is in the proxy code.  When a 
response gets "truncated" for whatever reason, it doesn't pass an error 
along, so the filters never know that "something bad" happened.


-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Jim Jagielski <ji...@jaguNET.com>.
On Dec 6, 2005, at 9:29 AM, Brian Akins wrote:

> I have a serious issue.  It seems that if something happens during  
> a proxy request after mod_http_proxy starts reading from the  
> backend server, no error is reported. (IE, see what happens when  
> ap_pass_brigade  returns non success).  The problem is that this  
> "partial page" may be cached because r->status is a cacheable code,  
> but the response is actually "broken."  The result is that the  
> broken version may be cached and served again and again.
>
> Should we break RFC (maybe) and change r->status to 503 (for  
> example) so cache can check letter.
>
> Note: this is using our internal mod_cache, but looks like the  
> issue would be the same in stock mod_cache.
>

Hmmm. I haven't taken a look yet, but is seems to me that
only complete responses should be cached, not partial, and
as such we need some better mechanism in place for that
"Not Cache-able", "Could be Cache-able" and "To-Be-Cached"
state tree.

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <ba...@web.turner.com>.
Roy T. Fielding wrote:

> Sorry, reverse proxy is not a proxy -- it is a gateway.

Sure it is.  but that's semantics and any further discussion of that 
would be futile.

> You are right that it won't make a difference in the gateway case,
> though from a software design perspective, pipe-and-filter architectures
> depend on sending errors down the pipe so that each filter can handle
> the error.  Having the filter set a global variable that is normally
> private to the client connection, just for the sake of obtaining the
> side-effect desired specifically for HTTP and mod_cache, is, well,
> unclean.

No, we are aborting the entire response in the proxy handler.  Other 
modules (see mod_dav) already do something similar.

If we to handle this via an "error" bucket, this needs to be well 
advertised, as several things now just check r->connection->aborted 
(mod_proxy_http for one).

As is, the current "feature" has halted and reversed my roll-out of 2.2.


-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On Dec 6, 2005, at 12:45 PM, Brian Akins wrote:
> That would work, I suppose, but wouldn't the client get the same  
> impression "proxy is broken"?  I generally only deal with "reverse  
> proxies," so if origin is broken, whole site is broken...

Sorry, reverse proxy is not a proxy -- it is a gateway.

You are right that it won't make a difference in the gateway case,
though from a software design perspective, pipe-and-filter architectures
depend on sending errors down the pipe so that each filter can handle
the error.  Having the filter set a global variable that is normally
private to the client connection, just for the sake of obtaining the
side-effect desired specifically for HTTP and mod_cache, is, well,
unclean.

....Roy

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <br...@turner.com>.
Roy T. Fielding wrote:

> That would depend on the current state of the proxy's response to
> the client, since it may have not even sent the status code on the
> wire.  If not, then marking the connection as aborted will just make
> it look like the proxy is broken, so we should send a 503 error
> message instead.

In the section we are examining, we have already set r->sent_bodyct, so, 
as far as we are concerned, we have sent data to the client.


> If the response code has already been sent on the wire, then HTTP
> has no way to communicate the break other than body != C-L.
> Note, however, that waka does have such a mechanism, so assuming
> that the entire chain is broken is incorrect  Instead, send an error
> bucket down the chain and have the proxy mark the connection as bad
> after sending the last good data packet.


That would work, I suppose, but wouldn't the client get the same 
impression "proxy is broken"?  I generally only deal with "reverse 
proxies," so if origin is broken, whole site is broken...

-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by "Roy T. Fielding" <fi...@gbiv.com>.
On Dec 6, 2005, at 9:38 AM, Justin Erenkrantz wrote:

> On Tue, Dec 06, 2005 at 12:22:18PM -0500, Brian Akins wrote:
>>> There *might* be a breakage if the server aborted it's half of the
>>> connection partway through the response.  I don't have the time  
>>> to fully
>>> look at the code, but there might be a code path that does so.   
>>> -- justin
>>
>> From what I can tell, stock proxy and cache is vulnerable to that.
>
> After a bit more of thinking, the right thing to do would be to have
> mod_proxy force a dropped connection to the client.  Since the backend
> bailed on us, there's no way for us to finish the response - we've  
> likely
> already guaranteed the headers, but have no way to keep the connection
> 'correct' with respect to the metadata we've already sent.  We must  
> bail.

That would depend on the current state of the proxy's response to
the client, since it may have not even sent the status code on the
wire.  If not, then marking the connection as aborted will just make
it look like the proxy is broken, so we should send a 503 error
message instead.

If the response code has already been sent on the wire, then HTTP
has no way to communicate the break other than body != C-L.
Note, however, that waka does have such a mechanism, so assuming
that the entire chain is broken is incorrect  Instead, send an error
bucket down the chain and have the proxy mark the connection as bad
after sending the last good data packet.

....Roy

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <ba...@web.turner.com>.
Brian Akins wrote:
> Justin Erenkrantz wrote:
> 
>> I think that's a side effect of being aborted.  I think we need to 
>> first abort the connection and then set that field.  =)  -- justin
>>
> 
> :) That's what I figured, just greping on "aborted" isn't showing 
> anything obvious...
> 
> 

Apparently that's how mod_dav does it.  (see mod_dav.c:4076).  From what 
I can tell some of the core stuff picks up on this.  It can get set in 
server/core_filters.c and http_core.c does something with it????

-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <ba...@web.turner.com>.
Justin Erenkrantz wrote:

> I think that's a side effect of being aborted.  I think we need to first 
> abort the connection and then set that field.  =)  -- justin
> 

:) That's what I figured, just greping on "aborted" isn't showing 
anything obvious...


-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On December 6, 2005 2:02:02 PM -0500 Brian Akins <ba...@web.turner.com> 
wrote:

> So, do we just need to set r->connection->aborted = 1 and core will take
> care of it?  If so, a patch should be trivial.

I think that's a side effect of being aborted.  I think we need to first 
abort the connection and then set that field.  =)  -- justin

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On December 6, 2005 3:01:28 PM -0500 Brian Akins <ba...@web.turner.com> 
wrote:

> So we need my patch and your patch, right?  I'm a little "medicated"
> today...

I think so, yes.  -- justin

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <ba...@web.turner.com>.
Justin Erenkrantz wrote:

> 
> I do think we need to fix mod_proxy_http to return an error.


So we need my patch and your patch, right?  I'm a little "medicated" 
today...


-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
--On December 6, 2005 2:19:58 PM -0500 Brian Akins <ba...@web.turner.com> 
wrote:

> Brian Akins wrote:
>> Justin Erenkrantz wrote:
>>
>>>
>>> After a bit more of thinking, the right thing to do would be to have
>>> mod_proxy force a dropped connection to the client.
>>
>>
>> So, do we just need to set r->connection->aborted = 1 and core will take
>>  care of it?  If so, a patch should be trivial.
>>
>
> After a quick glance through mod_http_proxy, this look to be correct:

Hmm.  Maybe I'm wrong.  That does appear how to set it.  Okie doke.

> --- mod_proxy_http.c.orig       2005-12-06 12:11:19.000000000 -0500
> +++ mod_proxy_http.c    2005-12-06 14:06:00.000000000 -0500
> @@ -1482,6 +1482,7 @@
>                       else if (rv != APR_SUCCESS) {
>                           ap_log_cerror(APLOG_MARK, APLOG_ERR, rv, c,
>                                         "proxy: error reading response");
> +                        c->aborted = 1;
>                           break;
>                       }
>                       /* next time try a non-blocking read */

I do think we need to fix mod_proxy_http to return an error.

Like so.  -- justin

Index: modules/proxy/mod_proxy_http.c
===================================================================
--- modules/proxy/mod_proxy_http.c      (revision 354485)
+++ modules/proxy/mod_proxy_http.c      (working copy)
@@ -1565,8 +1565,14 @@
            }
             return status;
         }
-    } else
-        return OK;
+    } else {
+        if (c->aborted) {
+            return DONE;
+        }
+        else {
+            return OK;
+        }
+    }
 }

 static

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <ba...@web.turner.com>.
Brian Akins wrote:
> Justin Erenkrantz wrote:
> 
>>
>> After a bit more of thinking, the right thing to do would be to have
>> mod_proxy force a dropped connection to the client. 
> 
> 
> So, do we just need to set r->connection->aborted = 1 and core will take 
>  care of it?  If so, a patch should be trivial.
> 

After a quick glance through mod_http_proxy, this look to be correct:


--- mod_proxy_http.c.orig       2005-12-06 12:11:19.000000000 -0500
+++ mod_proxy_http.c    2005-12-06 14:06:00.000000000 -0500
@@ -1482,6 +1482,7 @@
                      else if (rv != APR_SUCCESS) {
                          ap_log_cerror(APLOG_MARK, APLOG_ERR, rv, c,
                                        "proxy: error reading response");
+                        c->aborted = 1;
                          break;
                      }
                      /* next time try a non-blocking read */

-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <ba...@web.turner.com>.
Justin Erenkrantz wrote:

> 
> After a bit more of thinking, the right thing to do would be to have
> mod_proxy force a dropped connection to the client. 

So, do we just need to set r->connection->aborted = 1 and core will take 
  care of it?  If so, a patch should be trivial.

-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <br...@turner.com>.
Justin Erenkrantz wrote:
> On Tue, Dec 06, 2005 at 12:22:18PM -0500, Brian Akins wrote:
> 
>>>There *might* be a breakage if the server aborted it's half of the
>>>connection partway through the response.  I don't have the time to fully
>>>look at the code, but there might be a code path that does so.  -- justin
>>
>>>From what I can tell, stock proxy and cache is vulnerable to that.
> 
> 
> After a bit more of thinking, the right thing to do would be to have
> mod_proxy force a dropped connection to the client.  Since the backend
> bailed on us, there's no way for us to finish the response - we've likely
> already guaranteed the headers, but have no way to keep the connection
> 'correct' with respect to the metadata we've already sent.  We must bail.
> 
> If mod_proxy forces the dropped connection, this should, in turn, cause
> mod_cache to not be able to 'finish' the response.  Therefore, there should
> be no cachable partial response to serve.  -- justin

+1

The client is already screwed in this case anyway...

-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Tue, Dec 06, 2005 at 12:22:18PM -0500, Brian Akins wrote:
> >There *might* be a breakage if the server aborted it's half of the
> >connection partway through the response.  I don't have the time to fully
> >look at the code, but there might be a code path that does so.  -- justin
> 
> From what I can tell, stock proxy and cache is vulnerable to that.

After a bit more of thinking, the right thing to do would be to have
mod_proxy force a dropped connection to the client.  Since the backend
bailed on us, there's no way for us to finish the response - we've likely
already guaranteed the headers, but have no way to keep the connection
'correct' with respect to the metadata we've already sent.  We must bail.

If mod_proxy forces the dropped connection, this should, in turn, cause
mod_cache to not be able to 'finish' the response.  Therefore, there should
be no cachable partial response to serve.  -- justin

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <br...@turner.com>.
Justin Erenkrantz wrote:

> Our mod_cache will abort the response if the connection to the original
> client is aborted for whatever reason.  So, I'm doubtful the scenario you
> describe would happen to our mod_cache.  (See mod_disk_cache.c:1013.)

Cool. yep that would help me.


> There *might* be a breakage if the server aborted it's half of the
> connection partway through the response.  I don't have the time to fully
> look at the code, but there might be a code path that does so.  -- justin

 From what I can tell, stock proxy and cache is vulnerable to that.

On a side note, I would like to aggressively attach the performance 
issues I have with stock mod_cache.  I will hopefully give more details 
after I figure out what to do about this situation.


-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Tue, Dec 06, 2005 at 09:29:42AM -0500, Brian Akins wrote:
> I have a serious issue.  It seems that if something happens during a 
> proxy request after mod_http_proxy starts reading from the backend 
> server, no error is reported. (IE, see what happens when ap_pass_brigade 
>  returns non success).  The problem is that this "partial page" may be 
> cached because r->status is a cacheable code, but the response is 
> actually "broken."  The result is that the broken version may be cached 
> and served again and again.
> 
> Should we break RFC (maybe) and change r->status to 503 (for example) so 
> cache can check letter.
> 
> Note: this is using our internal mod_cache, but looks like the issue 
> would be the same in stock mod_cache.

Our mod_cache will abort the response if the connection to the original
client is aborted for whatever reason.  So, I'm doubtful the scenario you
describe would happen to our mod_cache.  (See mod_disk_cache.c:1013.)

There *might* be a breakage if the server aborted it's half of the
connection partway through the response.  I don't have the time to fully
look at the code, but there might be a code path that does so.  -- justin

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Tue, Dec 06, 2005 at 12:07:32PM -0500, Brian Akins wrote:
> in mod_cache in store_body check r->status on every bucket?  This may 
> need to be in providers for now???

No.  Changing the status after the first write will not matter.  -- justin

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <ba...@web.turner.com>.
Brian Akins wrote:

> As a quick fix, could we not have proxy set r->status = HTTP_BAD_GATEWAY 
> or something and re-check in the cache before finalizing the store?

pseudo code:

in proxy_http

if(some proxy error) {
	error_log("error during transit. forcing status change");
	r->status = HTTP_GATEWAY_TIME_OUT;  /*(or something not cahceable)*/
}


in mod_cache in store_body check r->status on every bucket?  This may 
need to be in providers for now???





-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On Tue, Dec 06, 2005 at 12:10:44PM -0500, Brian Akins wrote:
> Since I do not use the "stock" mod_cache, I cannot really test it. 
> However, I can try to get together a patch that changes r->status in 
> these cases.  Is that acceptable?  Will this screw up proxy_balancer or 
> is it out of the picture by this point?

Modifying r->status is not an option.  The right thing to do if there is a
backend error reading the response from upstream is for mod_proxy to return
an error as the handler.  Then, mod_cache's error handling code should come
into play and that should trigger the data file never being put into the
correct place.

The code around mod_proxy_http.c:1550 appears to be bogus.  -- justin

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <ba...@web.turner.com>.
Paul Querna wrote:

>> As a quick fix, could we not have proxy set r->status = 
>> HTTP_BAD_GATEWAY or something and re-check in the cache before 
>> finalizing the store?
>>
>>
> Yes, and then remove the problem content and header file.


Since I do not use the "stock" mod_cache, I cannot really test it. 
However, I can try to get together a patch that changes r->status in 
these cases.  Is that acceptable?  Will this screw up proxy_balancer or 
is it out of the picture by this point?



-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Paul Querna <ch...@force-elite.com>.
Brian Akins wrote:
> Paul Querna wrote:
>
>> Related issue:
>>
>> http://issues.apache.org/bugzilla/show_bug.cgi?id=15866
>>
>> I don't think its breaking the RFC to not-cache partial pages.
>
> Yep. That's my issue:
>
> This one doesn't have an easy solution.  The problem is that mod_proxy 
> currently
> has no way to tell mod_cache if a response terminated abnormally.  We 
> could add
> some code in mod_cache, to make sure Content-Length matches that 
> actual length,
> and invalidate the cache at that point.
>
>
> I would really like to use the "stock" 2.2 proxy modules, but I cannot 
> with this "bug."
>
> As a quick fix, could we not have proxy set r->status = 
> HTTP_BAD_GATEWAY or something and re-check in the cache before 
> finalizing the store?
>
>
Yes, and then remove the problem content and header file.

-Paul

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Brian Akins <br...@turner.com>.
Paul Querna wrote:

> Related issue:
> 
> http://issues.apache.org/bugzilla/show_bug.cgi?id=15866
> 
> I don't think its breaking the RFC to not-cache partial pages.

Yep. That's my issue:

This one doesn't have an easy solution.  The problem is that mod_proxy 
currently
has no way to tell mod_cache if a response terminated abnormally.  We 
could add
some code in mod_cache, to make sure Content-Length matches that actual 
length,
and invalidate the cache at that point.


I would really like to use the "stock" 2.2 proxy modules, but I cannot 
with this "bug."

As a quick fix, could we not have proxy set r->status = HTTP_BAD_GATEWAY 
or something and re-check in the cache before finalizing the store?


-- 
Brian Akins
Lead Systems Engineer
CNN Internet Technologies

Re: 2.2 mod_http_proxy and "partial" pages

Posted by Paul Querna <ch...@force-elite.com>.
Brian Akins wrote:
> I have a serious issue.  It seems that if something happens during a 
> proxy request after mod_http_proxy starts reading from the backend 
> server, no error is reported. (IE, see what happens when 
> ap_pass_brigade  returns non success).  The problem is that this 
> "partial page" may be cached because r->status is a cacheable code, 
> but the response is actually "broken."  The result is that the broken 
> version may be cached and served again and again.
>
> Should we break RFC (maybe) and change r->status to 503 (for example) 
> so cache can check letter.
>
> Note: this is using our internal mod_cache, but looks like the issue 
> would be the same in stock mod_cache.
>
>
Related issue:

http://issues.apache.org/bugzilla/show_bug.cgi?id=15866

I don't think its breaking the RFC to not-cache partial pages.

-Paul