You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Matthias Behrens <ma...@gulp.de> on 2006/01/05 18:47:12 UTC

mod_proxy vs. serverpush

hi everybody

i am running a cgi-serverfarm behind an apache webserver with mod_rewrite / mod_proxy

one cgi is supposed to send a progress bar using serverpush 
everytime the programm completes 5% of its work, it sends further html-code which lets the bar grow

u can all test this by going to 

http://www.gulp.de/kb/tools/trend.htm

just type some it-skills in the form and press send. you have to write some unusual combinations or choose older data to prevent the programms cachefunction sending an immediate reply.

as you can see the bar is completed only in very big steps. but actually every imagepart of the bar is sended seperately. some internal cache in the mod_proxy module prevents the results from beeing sended until (i guess) 8192 bytes are reached. i allready added 1024 spaces to every package to make it go a little more smooth but this is unacceptable for people with low bandwith.

can u help me with this cosmetical issue?

thx
matthias


Re: mod_proxy vs. serverpush

Posted by Ruediger Pluem <rp...@apache.org>.

On 01/05/2006 11:27 PM, Graham Leggett wrote:
> Ruediger Pluem wrote:

[..cut..]

> Looking deeper into this, if the above was never true, then the loop
> would spin resulting in 100% processor usage for this process/thread
> while the download was running.
> 
> Are you 100% sure this is never called?

Meanwhile I found the reason for this behaviour. We are never doing non blocking reads:

#0  socket_bucket_read (a=0x8196748, str=0x40ad052c, len=0x40ad0530, block=APR_BLOCK_READ) at
buckets/apr_buckets_socket.c:22
#1  0x40052463 in apr_brigade_split_line (bbOut=0x819c3a8, bbIn=0x819c288, block=APR_BLOCK_READ, maxbytes=8192) at
buckets/apr_brigade.c:292
#2  0x080721d7 in ap_core_input_filter (f=0x8194cb0, b=0x819c3a8, mode=AP_MODE_GETLINE, block=APR_BLOCK_READ, readbytes=0)
    at core_filters.c:155
#3  0x0807c119 in ap_get_brigade (next=0x8194cb0, bb=0x819c3a8, mode=AP_MODE_GETLINE, block=APR_BLOCK_READ,
readbytes=583581946604225836)
    at util_filter.c:489
#4  0x404dbb47 in logio_in_filter (f=0x8194c88, bb=0x819c3a8, mode=AP_MODE_GETLINE, block=APR_BLOCK_READ, readbytes=0)
at mod_logio.c:115
#5  0x0807c119 in ap_get_brigade (next=0x8194c88, bb=0x819c3a8, mode=AP_MODE_GETLINE, block=APR_BLOCK_READ,
readbytes=583581774805533996)
    at util_filter.c:489
#6  0x0807f27c in ap_http_filter (f=0x819c330, b=0x8195058, mode=AP_MODE_READBYTES, block=APR_NONBLOCK_READ, readbytes=8192)
    at http_filters.c:292
#7  0x0807c119 in ap_get_brigade (next=0x819c330, bb=0x8195058, mode=AP_MODE_READBYTES, block=APR_NONBLOCK_READ,
    readbytes=583712238732117292) at util_filter.c:489
#8  0x405bae91 in ap_proxy_http_process_response (p=0x8193f38, r=0x8197f80, backend=0x818d908, origin=0x8194790,
conf=0x818c9b0,
    server_portstr=0x40ad282c "") at mod_proxy_http.c:1463
#9  0x405bb9d2 in proxy_http_handler (r=0x8197f80, worker=0x818b808, conf=0x818c9b0, url=0x8194780 "/test/long.jsp",
proxyname=0x0,
    proxyport=0) at mod_proxy_http.c:1732
#10 0x405a4063 in proxy_run_scheme_handler (r=0x8197f80, worker=0x818b808, conf=0x818c9b0,
    url=0x8199730 "http://127.0.0.1:8080/test/long.jsp", proxyhost=0x0, proxyport=0) at mod_proxy.c:1941

ap_http_filter changes the mode from APR_NONBLOCK_READ to APR_BLOCK_READ.
So I think we must check if we can adjust ap_http_filter. I guess this is not an easy task.
Maybe this changes once Brian makes further progress on its async-read branch.

Regards

Rüdiger

Re: mod_proxy vs. serverpush

Posted by Graham Leggett <mi...@sharp.fm>.
Ruediger Pluem wrote:

>>> Anyway, it does not work as expected, as it seems that the condition
>>> (APR_STATUS_IS_EAGAIN(rv)
>>>                         || (rv == APR_SUCCESS &&
>>> APR_BRIGADE_EMPTY(bb))) {
>>> never gets true.

Looking deeper into this, if the above was never true, then the loop 
would spin resulting in 100% processor usage for this process/thread 
while the download was running.

Are you 100% sure this is never called?

Regards,
Graham
--

Re: mod_proxy vs. serverpush

Posted by Ruediger Pluem <rp...@apache.org>.

On 01/05/2006 10:25 PM, Graham Leggett wrote:
> Ruediger Pluem wrote:
> 

[..cut..]

>> Anyway, it does not work as expected, as it seems that the condition
>> (APR_STATUS_IS_EAGAIN(rv)
>>                         || (rv == APR_SUCCESS &&
>> APR_BRIGADE_EMPTY(bb))) {
>> never gets true.
> 
> 
> I think this if statement covers the case where a non blocking read is
> attempted, and zero bytes are returned, in which case another non
> blocking read might also return zero bytes, causing the loop to spin at
> 100% processor usage.
> 
> The problem lies in the code further down:
> 
>                     /* try send what we read */
>                     if (ap_pass_brigade(r->output_filters, bb) !=
> APR_SUCCESS
>                         || c->aborted) {
>                         /* Ack! Phbtt! Die! User aborted! */
>                         backend->close = 1;  /* this causes socket close
> below *
> /
>                         finish = TRUE;
>                     }

No, I do not think so from the original idea. From my point of view the typical situation
on the backend that wants to flush some data is:

1. It sends some data.
2. It does no send any more data for a while.

If the condition would work as I expect, then all data from the backend is read (maybe in
several loop iterations and thus passed down the filter chain already by the code you mention above).
Then the backend would stop sending data for some time causing the condition to become true.
This would cause the code to add a flush bucket and pass the brigade down the filter chain.


> 
> Without explicitly adding flush buckets to the output filter stack, the
> output filter stack seems to buffer before sending (rational behaviour).
> 
> To change this, we would need to add an output flush bucket after each
> read.

I do not think that this a good idea. It leads to too much traffic overhead. Basicly I like
the idea: Read from the backend as long as data is present (non-blocking). If no more data
is present flush filter chain and switch to blocking mode to wait for further data.

Regards

Rüdiger


Re: mod_proxy vs. serverpush

Posted by Graham Leggett <mi...@sharp.fm>.
Ruediger Pluem wrote:

> The logic in mod_proxy_http.c of 2.2.x already tries to address this issue by
> flushing the data if no more data is available from the backend right now:
> 
>                 apr_read_type_e mode = APR_NONBLOCK_READ;
>                 int finish = FALSE;
> 
>                 do {
>                     apr_off_t readbytes;
>                     apr_status_t rv;
> 
>                     rv = ap_get_brigade(rp->input_filters, bb,
>                                         AP_MODE_READBYTES, mode,
>                                         conf->io_buffer_size);
> 
>                     /* ap_get_brigade will return success with an empty brigade
>                      * for a non-blocking read which would block: */
>                     if (APR_STATUS_IS_EAGAIN(rv)
>                         || (rv == APR_SUCCESS && APR_BRIGADE_EMPTY(bb))) {
>                         /* flush to the client and switch to blocking mode */
>                         e = apr_bucket_flush_create(c->bucket_alloc);
>                         APR_BRIGADE_INSERT_TAIL(bb, e);
>                         if (ap_pass_brigade(r->output_filters, bb)
>                             || c->aborted) {
>                             backend->close = 1;
>                             break;
>                         }
>                         apr_brigade_cleanup(bb);
>                         mode = APR_BLOCK_READ;
>                         continue;
>                     }
>                     else if (rv == APR_EOF) {
>                         break;
>                     }
>                     else if (rv != APR_SUCCESS) {
>                         ap_log_cerror(APLOG_MARK, APLOG_ERR, rv, c,
>                                       "proxy: error reading response");
>                         break;
>                     }
>                     /* next time try a non-blocking read */
>                     mode = APR_NONBLOCK_READ;
> 
> 
> Anyway, it does not work as expected, as it seems that the condition
> (APR_STATUS_IS_EAGAIN(rv)
>                         || (rv == APR_SUCCESS && APR_BRIGADE_EMPTY(bb))) {
> never gets true.

I think this if statement covers the case where a non blocking read is 
attempted, and zero bytes are returned, in which case another non 
blocking read might also return zero bytes, causing the loop to spin at 
100% processor usage.

The problem lies in the code further down:

                     /* try send what we read */
                     if (ap_pass_brigade(r->output_filters, bb) != 
APR_SUCCESS
                         || c->aborted) {
                         /* Ack! Phbtt! Die! User aborted! */
                         backend->close = 1;  /* this causes socket 
close below *
/
                         finish = TRUE;
                     }

Without explicitly adding flush buckets to the output filter stack, the 
output filter stack seems to buffer before sending (rational behaviour).

To change this, we would need to add an output flush bucket after each read.

Is this a rational thing to do in the general case? Or should the 
addition of the flush be configurable?

Regards,
Graham
--

Re: mod_proxy vs. serverpush

Posted by Ruediger Pluem <rp...@apache.org>.

On 01/05/2006 09:04 PM, Graham Leggett wrote:
> Matthias Behrens wrote:

[..cut..]

> 
> This is an interesting problem, but definitely worth looking into fixing.

The logic in mod_proxy_http.c of 2.2.x already tries to address this issue by
flushing the data if no more data is available from the backend right now:

                apr_read_type_e mode = APR_NONBLOCK_READ;
                int finish = FALSE;

                do {
                    apr_off_t readbytes;
                    apr_status_t rv;

                    rv = ap_get_brigade(rp->input_filters, bb,
                                        AP_MODE_READBYTES, mode,
                                        conf->io_buffer_size);

                    /* ap_get_brigade will return success with an empty brigade
                     * for a non-blocking read which would block: */
                    if (APR_STATUS_IS_EAGAIN(rv)
                        || (rv == APR_SUCCESS && APR_BRIGADE_EMPTY(bb))) {
                        /* flush to the client and switch to blocking mode */
                        e = apr_bucket_flush_create(c->bucket_alloc);
                        APR_BRIGADE_INSERT_TAIL(bb, e);
                        if (ap_pass_brigade(r->output_filters, bb)
                            || c->aborted) {
                            backend->close = 1;
                            break;
                        }
                        apr_brigade_cleanup(bb);
                        mode = APR_BLOCK_READ;
                        continue;
                    }
                    else if (rv == APR_EOF) {
                        break;
                    }
                    else if (rv != APR_SUCCESS) {
                        ap_log_cerror(APLOG_MARK, APLOG_ERR, rv, c,
                                      "proxy: error reading response");
                        break;
                    }
                    /* next time try a non-blocking read */
                    mode = APR_NONBLOCK_READ;


Anyway, it does not work as expected, as it seems that the condition
(APR_STATUS_IS_EAGAIN(rv)
                        || (rv == APR_SUCCESS && APR_BRIGADE_EMPTY(bb))) {
never gets true. I did not have the time to dig in deeper, but maybe you want
to do some search.
BTW: mod_proxy_ajp currently addresses this issue with a bandaid until
the AJP protocol has some sort of flushing command. It is also based on
a check if more data is available from the backend right now.

[..cut..]

Regards

Rüdiger

Re: mod_proxy vs. serverpush

Posted by Graham Leggett <mi...@sharp.fm>.
Matthias Behrens wrote:

> i am running a cgi-serverfarm behind an apache webserver with mod_rewrite / mod_proxy
> 
> one cgi is supposed to send a progress bar using serverpush 
> everytime the programm completes 5% of its work, it sends further html-code which lets the bar grow
> 
> u can all test this by going to 
> 
> http://www.gulp.de/kb/tools/trend.htm
> 
> just type some it-skills in the form and press send. you have to write some unusual combinations or choose older data to prevent the programms cachefunction sending an immediate reply.
> 
> as you can see the bar is completed only in very big steps. but actually every imagepart of the bar is sended seperately. some internal cache in the mod_proxy module prevents the results from beeing sended until (i guess) 8192 bytes are reached. i allready added 1024 spaces to every package to make it go a little more smooth but this is unacceptable for people with low bandwith.

This is an interesting problem, but definitely worth looking into fixing.

The 8k buffer is correct, proxy tries to read up to 8k at a time, and 
then after receiving a full 8k, it sends that 8k down the filter stack, 
over the network to the browser.

I think the root of the problem is that there is no way (or maybe there 
is and I don't know how to do it yet) to say to the input filter stack 
"give me what you got up to a maximum of 8k". If 10 bytes had arrived, 
then 10 bytes would be returned, and the next read would block waiting 
for the next piece to arrive.

The downside of this approach is that if a backend server wrote one byte 
at a time to the filter stack, then the proxy would write out chunks 
containing one byte per chunk, resulting in a large multiplication of 
bandwidth. Perhaps making this configurable would be the answer.

Regards,
Graham
--