You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Matthieu Estrade <me...@axiliance.com> on 2002/09/23 11:28:55 UTC
mod_proxy + mod_cache problem, loosing EOS bucket
Hi,
i'am working on mod_cache and mod_mem_cache + mod_proxy
i found a problem with EOS Bucket.
mod_cache is inserting his cache_in filter in output_filter when he
wants to insert data in the cache.
when mod_cache is used with mod_proxy, the cache_in filter is called
after mod_proxy call ap_pass_brigade in the
ap_proxy_http_process_response, after received data from backend server.
when i do test on a website, sometimes, i can see my file is not cached
because length > MaxStreamingBuffer.
when i put a debug on the MaxStreamingBuffer value, i always get 0 even
if i setup in my httpd.conf another value with CacheMaxStreamingBuffer.
So maybe a first bug here.
But the problem doesn't come from this bug.
When i process my test, i can see that some files are not cached because
of this error, so i put debug in the function calculating the length of
data to be cached in mod_cache.
And something really strange happen, strange for me, but maybe not for
you....
what happen:
APR_BRIGADE_FOR_EACH(e,in){
look if bucket is eos, if yes, exit with all_bucket_here=1
or: look if bucket is flush, if yes, exit with unresolved_length=1
or: size += e->length;
}
on the file not cached because the MaxStreamingBuffer error, the code
above never detect a EOS bucket, so, it return a size to the following code:
if (!all_bucket_here){
if (unresolved_length || size > MaxStreamingBuffer)
exit with MaxStreamingBuffer Error.
}
Where it's really strange, is i put a debug in mod_proxy, when it
receive the data from backend server, and before passing the brigade to
the output_filter, i can detect a EOS Bucket.
So, i have a EOS Bucket found in the brigade mod_proxy will pass to
output_filter, and in the filter reading the same brigade in mod_cache
(cache_in filter), i can't find this same EOS Bucket.
if someone have an idea :)
i will continue to search on that.
Best regards,
Estrade Matthieu
Re: Patch mod_proxy: mod_proxy + mod_cache problem
Posted by Graham Leggett <mi...@sharp.fm>.
Matthieu Estrade wrote:
> I did a patch modifiying mod_proxy to pass the entire data (response
> from backend server) to output_filter, unstead of brigade per brigade.
AFAIK the problem stems from a limitation in mod_cache where it can only
cache stuff in a single brigade at the moment.
Proxy works correctly already - there is no such thing as "pass the
entire data", because there is no way for proxy to know just how much
data there is in advance, and there is no guarantee that data will all
fit in memory at once.
Regards,
Graham
--
-----------------------------------------
minfrin@sharp.fm "There's a moon
over Bourbon Street
tonight..."
Re: Patch mod_proxy: mod_proxy + mod_cache problem
Posted by Graham Leggett <mi...@sharp.fm>.
Ian Holsman wrote:
>> The cache filter is supposed to run after all the filters for maximum
>> caching advantage.
>>
> I disagree on this. the cache filter should be able to placed where the
> admin wants it.
In the advanced case, yes - but in the simplest
turn-it-on-and-it-should-just-work case the cache should be as optimised
as possible.
> what the code should do is remember which filters where placed above the
> cache_input filter, and restore these filters on cache_output.
This starts getting complex.
The original idea for the cache was to be transparent public proxy cache
layered between Apache and the browser. Because RFC2616 dictates how
such a proxy cache should work, determining whether we have done the
"right" thing should be relatively easy.
As we move the caching filters further towards the core of Apache,
suddenly things start to get complicated, as there is no longer a
guarantee that the point into which cache has been inserted is compliant
with RFC2616 in the first place, which in turn makes things less likely
to work predictably or correctly.
Yes, being able to put the cache wherever the admin wants is in theory a
nice thing to have, actually practically achieving this is something
that's going to have to be looked at pretty carefully.
Regards,
Graham
--
-----------------------------------------
minfrin@sharp.fm "There's a moon
over Bourbon Street
tonight..."
Re: Patch mod_proxy: mod_proxy + mod_cache problem
Posted by Ian Holsman <ia...@apache.org>.
Graham Leggett wrote:
> Matthieu Estrade wrote:
>
>> I agree with you about the proxy...
>> Do you think it's possible to force the cache filter, be runned after
>> all the proxy filters ?
>
>
> The cache filter is supposed to run after all the filters for maximum
> caching advantage.
>
I disagree on this. the cache filter should be able to placed where the
admin wants it.
what the code should do is remember which filters where placed above the
cache_input filter, and restore these filters on cache_output.
this would allow people to have pages designed for a specific user, as
well as caching most of it
> Regards,
> Graham
Re: Patch mod_proxy: mod_proxy + mod_cache problem
Posted by Graham Leggett <mi...@sharp.fm>.
Matthieu Estrade wrote:
> My problem is i can't setup the filter cache_in_filter to be executed
> after mod_proxy pass his "last" brigade to the filter chain
That's because there is never any such thing as the "last" brigade.
Responses are not limited by length anywhere within the filter
subsystem, this is the whole point behind using brigades - long
neverending chains of data that can be of any length, even bigger than
available RAM.
The cache filter works (should work) along the idea that each brigade is
simply added to the one before. If this gets too big, we throw it away
and the object is no longer cached.
Regards,
Graham
--
-----------------------------------------
minfrin@sharp.fm "There's a moon
over Bourbon Street
tonight..."
Re: Patch mod_proxy: mod_proxy + mod_cache problem
Posted by "Paul J. Reder" <re...@remulak.net>.
Matthieu Estrade wrote:
In a previous note you explained:
> on the file not cached because the MaxStreamingBuffer error, the code above never detect a EOS bucket, so, it return a size to the following code:
> if (!all_bucket_here){
> if (unresolved_length || size > MaxStreamingBuffer)
> exit with MaxStreamingBuffer Error.
> }
If it entered into this chunk of code then the cache code either:
1) encountered a bucket with an unresolved length (like a socket bucket) or
2) the cumulative length of all buckets/brigades encountered so far was greater
then the value of MaxStreamingBuffer.
If the program goes into this branch of code it will remove the cache filter
from teh filter chain. This is why the cache code is not looking at any of
the subsequent brigades. Cache has already determined that it has no reason
to look at them.
If this is happening due to exceeding MaxStreamingBuffer, try setting the
value to something higher and see if it works for you. If it is encountering
a bucket with an unresolved length then there is nothing that can be done.
That type of bucket cannot be cached.
Proxy is working the way it should.
>
> Hi Paul,
>
> I know about mod_cache is only working on one brigade, but my problem is
> not here.
> My problem is i can't setup the filter cache_in_filter to be executed
> after mod_proxy pass his "last" brigade to the filter chain
> Actually, mod_cache_filter_in is executed only one time when mod_proxy
> have passed the "first" brigade to the output_filters.
> So the others brigade containing data are not available.
>
> so, have i to consider that cache_filter_in will have to be able to
> cache document in many times process,
> Or is it possible to think that the cache_filter could be placed when
> mod_proxy finished to pass the "last" brigade,
> reading all the brigade and cache it ?
>
> regards,
>
> Matthieu
>
> Paul J. Reder wrote:
>
>> Actually, the problem is in the fact that there is more than one
>> brigade and the cache code can't currently handle that.
>>
>> Proxy is working the way it is supposed to. This allows Apache
>> to process large responses without having to buffer the whole
>> thing inside Apache. Apache uses less memory, and the user starts
>> seeing results sooner. If proxy were to process all of the
>> brigades of a large response before the next filter were allowed
>> access, then proxy could potentially buffer a *huge* amount of
>> data.
>>
>> The answer is that the cache code currently only caches responses
>> that arrive in one brigade. Proxy isn't the problem.
>>
>> Matthieu Estrade wrote:
>>
>>> Hi graham,
>>>
>>> the problem is the filter is called between two ap_pass_brigade by
>>> the reverse proxy...
>>> like:
>>>
>>> proxy:
>>> get_brigade (data from backend)
>>> pass_brigade(pass data to outputfilter)
>>> cache_filter
>>> get_brigade(data from backend)
>>> pass_brigade(pass_data_to outputfilter)
>>>
>>> on one response by the backend server.
>>>
>>> regards,
>>> Matthieu
>>>
>>> Graham Leggett wrote:
>>>
>>>> Matthieu Estrade wrote:
>>>>
>>>>> I agree with you about the proxy...
>>>>> Do you think it's possible to force the cache filter, be runned
>>>>> after all the proxy filters ?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> The cache filter is supposed to run after all the filters for
>>>> maximum caching advantage.
>>>>
>>>> Regards,
>>>> Graham
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>
>
>
--
Paul J. Reder
-----------------------------------------------------------
"The strength of the Constitution lies entirely in the determination of each
citizen to defend it. Only if every single citizen feels duty bound to do
his share in this defense are the constitutional rights secure."
-- Albert Einstein
Re: Patch mod_proxy: mod_proxy + mod_cache problem
Posted by Matthieu Estrade <me...@axiliance.com>.
Hi Paul,
I know about mod_cache is only working on one brigade, but my problem is
not here.
My problem is i can't setup the filter cache_in_filter to be executed
after mod_proxy pass his "last" brigade to the filter chain
Actually, mod_cache_filter_in is executed only one time when mod_proxy
have passed the "first" brigade to the output_filters.
So the others brigade containing data are not available.
so, have i to consider that cache_filter_in will have to be able to
cache document in many times process,
Or is it possible to think that the cache_filter could be placed when
mod_proxy finished to pass the "last" brigade,
reading all the brigade and cache it ?
regards,
Matthieu
Paul J. Reder wrote:
> Actually, the problem is in the fact that there is more than one
> brigade and the cache code can't currently handle that.
>
> Proxy is working the way it is supposed to. This allows Apache
> to process large responses without having to buffer the whole
> thing inside Apache. Apache uses less memory, and the user starts
> seeing results sooner. If proxy were to process all of the
> brigades of a large response before the next filter were allowed
> access, then proxy could potentially buffer a *huge* amount of
> data.
>
> The answer is that the cache code currently only caches responses
> that arrive in one brigade. Proxy isn't the problem.
>
> Matthieu Estrade wrote:
>
>> Hi graham,
>>
>> the problem is the filter is called between two ap_pass_brigade by
>> the reverse proxy...
>> like:
>>
>> proxy:
>> get_brigade (data from backend)
>> pass_brigade(pass data to outputfilter)
>> cache_filter
>> get_brigade(data from backend)
>> pass_brigade(pass_data_to outputfilter)
>>
>> on one response by the backend server.
>>
>> regards,
>> Matthieu
>>
>> Graham Leggett wrote:
>>
>>> Matthieu Estrade wrote:
>>>
>>>> I agree with you about the proxy...
>>>> Do you think it's possible to force the cache filter, be runned
>>>> after all the proxy filters ?
>>>
>>>
>>>
>>>
>>> The cache filter is supposed to run after all the filters for
>>> maximum caching advantage.
>>>
>>> Regards,
>>> Graham
>>
>>
>>
>>
>>
>>
>>
>
>
Re: Patch mod_proxy: mod_proxy + mod_cache problem
Posted by "Paul J. Reder" <re...@remulak.net>.
Actually, the problem is in the fact that there is more than one
brigade and the cache code can't currently handle that.
Proxy is working the way it is supposed to. This allows Apache
to process large responses without having to buffer the whole
thing inside Apache. Apache uses less memory, and the user starts
seeing results sooner. If proxy were to process all of the
brigades of a large response before the next filter were allowed
access, then proxy could potentially buffer a *huge* amount of
data.
The answer is that the cache code currently only caches responses
that arrive in one brigade. Proxy isn't the problem.
Matthieu Estrade wrote:
> Hi graham,
>
> the problem is the filter is called between two ap_pass_brigade by the
> reverse proxy...
> like:
>
> proxy:
> get_brigade (data from backend)
> pass_brigade(pass data to outputfilter)
> cache_filter
> get_brigade(data from backend)
> pass_brigade(pass_data_to outputfilter)
>
> on one response by the backend server.
>
> regards,
> Matthieu
>
> Graham Leggett wrote:
>
>> Matthieu Estrade wrote:
>>
>>> I agree with you about the proxy...
>>> Do you think it's possible to force the cache filter, be runned after
>>> all the proxy filters ?
>>
>>
>>
>> The cache filter is supposed to run after all the filters for maximum
>> caching advantage.
>>
>> Regards,
>> Graham
>
>
>
>
>
>
--
Paul J. Reder
-----------------------------------------------------------
"The strength of the Constitution lies entirely in the determination of each
citizen to defend it. Only if every single citizen feels duty bound to do
his share in this defense are the constitutional rights secure."
-- Albert Einstein
Re: Patch mod_proxy: mod_proxy + mod_cache problem
Posted by Matthieu Estrade <me...@axiliance.com>.
Hi graham,
the problem is the filter is called between two ap_pass_brigade by the
reverse proxy...
like:
proxy:
get_brigade (data from backend)
pass_brigade(pass data to outputfilter)
cache_filter
get_brigade(data from backend)
pass_brigade(pass_data_to outputfilter)
on one response by the backend server.
regards,
Matthieu
Graham Leggett wrote:
> Matthieu Estrade wrote:
>
>> I agree with you about the proxy...
>> Do you think it's possible to force the cache filter, be runned after
>> all the proxy filters ?
>
>
> The cache filter is supposed to run after all the filters for maximum
> caching advantage.
>
> Regards,
> Graham
Re: Patch mod_proxy: mod_proxy + mod_cache problem
Posted by Graham Leggett <mi...@sharp.fm>.
Matthieu Estrade wrote:
> I agree with you about the proxy...
> Do you think it's possible to force the cache filter, be runned after
> all the proxy filters ?
The cache filter is supposed to run after all the filters for maximum
caching advantage.
Regards,
Graham
--
-----------------------------------------
minfrin@sharp.fm "There's a moon
over Bourbon Street
tonight..."
Re: Patch mod_proxy: mod_proxy + mod_cache problem
Posted by Matthieu Estrade <me...@axiliance.com>.
hi bill,
I agree with you about the proxy...
Do you think it's possible to force the cache filter, be runned after
all the proxy filters ?
Matthieu
Bill Stoddard wrote:
>>Hi again :)
>>
>>I did a patch modifiying mod_proxy to pass the entire data (response
>>from backend server) to output_filter, unstead of brigade per brigade.
>>
>>
>>
>One comment (having not reviewed the patch): Proxy should stream bytes from
>the backend server to the client as those bytes arrive. Proxy should not
>require that the entire response from the backend be received before writing
>to the client.
>
>Bill
>
>
>
>
>
RE: Patch mod_proxy: mod_proxy + mod_cache problem
Posted by Bill Stoddard <bi...@wstoddard.com>.
> Hi again :)
>
> I did a patch modifiying mod_proxy to pass the entire data (response
> from backend server) to output_filter, unstead of brigade per brigade.
>
One comment (having not reviewed the patch): Proxy should stream bytes from
the backend server to the client as those bytes arrive. Proxy should not
require that the entire response from the backend be received before writing
to the client.
Bill
Patch mod_proxy: mod_proxy + mod_cache problem
Posted by Matthieu Estrade <me...@axiliance.com>.
Hi again :)
I did a patch modifiying mod_proxy to pass the entire data (response
from backend server) to output_filter, unstead of brigade per brigade.
it seems to work well...
Matthieu
Matthieu Estrade wrote:
>
> Hi again,
>
> the problem seems to be in the proxy.
>
> When proxy read the data and pass it to output filter, it do:
>
> while (ap_get_brigade){
>
> ap_pass_brigade(output_filter)
>
> }
>
> si if the data aren't read by only one brigade, the mod_cache can't work.
> because it will try to cache only the first brigade with a part of data.
>
> Do you think it's better to modify how proxy is passing data to the
> output_filter
> or to modify the way mod_cache is getting his data from bucket_brigade ?
>
> Matthieu
>
> Matthieu Estrade wrote:
>
>
Re: mod_proxy + mod_cache problem, loosing EOS bucket
Posted by Matthieu Estrade <me...@axiliance.com>.
Hi again,
the problem seems to be in the proxy.
When proxy read the data and pass it to output filter, it do:
while (ap_get_brigade){
ap_pass_brigade(output_filter)
}
si if the data aren't read by only one brigade, the mod_cache can't work.
because it will try to cache only the first brigade with a part of data.
Do you think it's better to modify how proxy is passing data to the
output_filter
or to modify the way mod_cache is getting his data from bucket_brigade ?
Matthieu
Matthieu Estrade wrote:
> Hi,
>
> i'am working on mod_cache and mod_mem_cache + mod_proxy
> i found a problem with EOS Bucket.
>
> mod_cache is inserting his cache_in filter in output_filter when he
> wants to insert data in the cache.
> when mod_cache is used with mod_proxy, the cache_in filter is called
> after mod_proxy call ap_pass_brigade in the
> ap_proxy_http_process_response, after received data from backend server.
>
> when i do test on a website, sometimes, i can see my file is not
> cached because length > MaxStreamingBuffer.
> when i put a debug on the MaxStreamingBuffer value, i always get 0
> even if i setup in my httpd.conf another value with
> CacheMaxStreamingBuffer.
> So maybe a first bug here.
>
> But the problem doesn't come from this bug.
>
> When i process my test, i can see that some files are not cached
> because of this error, so i put debug in the function calculating the
> length of data to be cached in mod_cache.
> And something really strange happen, strange for me, but maybe not for
> you....
>
> what happen:
>
> APR_BRIGADE_FOR_EACH(e,in){
>
> look if bucket is eos, if yes, exit with all_bucket_here=1
> or: look if bucket is flush, if yes, exit with unresolved_length=1
> or: size += e->length;
>
> }
>
> on the file not cached because the MaxStreamingBuffer error, the code
> above never detect a EOS bucket, so, it return a size to the following
> code:
>
> if (!all_bucket_here){
> if (unresolved_length || size > MaxStreamingBuffer)
> exit with MaxStreamingBuffer Error.
> }
>
> Where it's really strange, is i put a debug in mod_proxy, when it
> receive the data from backend server, and before passing the brigade
> to the output_filter, i can detect a EOS Bucket.
> So, i have a EOS Bucket found in the brigade mod_proxy will pass to
> output_filter, and in the filter reading the same brigade in mod_cache
> (cache_in filter), i can't find this same EOS Bucket.
>
> if someone have an idea :)
> i will continue to search on that.
>
> Best regards,
>
> Estrade Matthieu
>
>
>
>