You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modproxy-dev@apache.org by Roman Gavrilov <ro...@aduva.com> on 2004/10/20 15:26:19 UTC

mod_proxy reverse proxy optimization/performance question

I am using a reverse proxy to cache a remote site. The files are mostly 
rpms, with varying sizes: 3-30M or more.
Now if you have a number of requests for the same file which is not yet 
cached locally, all of these requests will download the requested file 
from the remote site.  It will slow down the speed of each download as 
the throughput of the line will be split among all processes.
So if there are lots of processes to download the same rpm from a remote 
site, this can take lots of time to complete a request.
This can bring apache to a state where it can not serve other requests, 
as all available processes are already busy.


In my opinion it would be more efficient to let one process complete the 
request (using maximum line throughput) and return some busy code to 
other identical, simultaneous requests  until the file is cached locally.
As anyone run into a similar situation? What solution did you find?

I have created a solution, as I did not find anything else already 
existing. I would like to discuss it here and get your opinions.
1. When a request for a file that is not yet in the local cache is 
accepted by the proxy, a temporary lock file is created (based on the 
proxy's pathname of the file, changed from directory slashes to 
underscores).
2. Other processes requesting the same file will check first for the 
lock file. If found, they will return a busy code (ie: 408 Request 
Timeout), and the request should be sent repeatedly until successful.

Please let me know what you think of this approach, especially if you 
have done or seen something similar.

-- 
-------------------------------------------------------------
I am root. If you see me laughing... You better have a backup!





Re: mod_proxy reverse proxy optimization/performance question

Posted by Graham Leggett <mi...@sharp.fm>.
Roman Gavrilov wrote:

> I am using a reverse proxy to cache a remote site. The files are mostly 
> rpms, with varying sizes: 3-30M or more.
> Now if you have a number of requests for the same file which is not yet 
> cached locally, all of these requests will download the requested file 
> from the remote site.  It will slow down the speed of each download as 
> the throughput of the line will be split among all processes.
> So if there are lots of processes to download the same rpm from a remote 
> site, this can take lots of time to complete a request.
> This can bring apache to a state where it can not serve other requests, 
> as all available processes are already busy.

This is a mod_cache issue rather than a proxy issue, the best place to 
discuss something like this is dev@httpd.apache.org. (mod_cache was 
separated from mod_proxy in httpd v2.0, this fix never went into httpd 
v1.3 mod_proxy because it was a serious architecture change)

When mod_cache was separated from mod_proxy in httpd v2.0, one of the 
problems the new cache code was supposed to solve was this exact problem 
- whether this problem stayed solved in all the development to mod_cache 
that has been done in the last while is a good question.

Regards,
Graham
--