You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Roman Gavrilov <ro...@aduva.com> on 2004/10/20 17:08:10 UTC

mod_proxy reverse proxy optimization/performance question

I am using a reverse proxy to cache a remote site. The files are mostly 
rpms, with varying sizes: 3-30M or more.
Now if you have a number of requests for the same file which is not yet 
cached locally, all of these requests will download the requested file 
from the remote site.  It will slow down the speed of each download as 
the throughput of the line will be split among all processes.
So if there are lots of processes to download the same rpm from a remote 
site, this can take lots of time to complete a request.
This can bring apache to a state where it can not serve other requests, 
as all available processes are already busy.


In my opinion it would be more efficient to let one process complete the 
request (using maximum line throughput) and return some busy code to 
other identical, simultaneous requests  until the file is cached locally.
As anyone run into a similar situation? What solution did you find?

I have created a solution, as I did not find anything else already 
existing. I would like to discuss it here and get your opinions.
1. When a request for a file that is not yet in the local cache is 
accepted by the proxy, a temporary lock file is created (based on the 
proxy's pathname of the file, changed from directory slashes to 
underscores).
2. Other processes requesting the same file will check first for the 
lock file. If found, they will return a busy code (ie: 408 Request 
Timeout), and the request should be sent repeatedly until successful.

Please let me know what you think of this approach, especially if you 
have done or seen something similar.
Apache version 1.3.x

Thank you
Roman

-- 
-------------------------------------------------------------
I am root. If you see me laughing... You better have a backup!





Re: mod_proxy reverse proxy optimization/performance question

Posted by Igor Sysoev <is...@rambler-co.ru>.
On Thu, 21 Oct 2004, Roman Gavrilov wrote:

> No,  when https request gets to the server(apache), its being decrypted
> first then passed through apache routines, when it gets
> to the proxy part the URI already decrypted. proxy in its turn issues a
> request to the backend https server and returns the answer to the client
> of course after caching it.

Well, it's the same as I described.
No, mod_accel can not connect to backend using https.

> Roman
>
> Igor Sysoev wrote:
>
> >On Thu, 21 Oct 2004, Roman Gavrilov wrote:
> >
> >
> >
> >>I don't see any problem using it, actually I am doing it. I am not
> >>talking about proxying between http and https.
> >>Mostly its used for mirroring (both frontend and backend use https only)
> >>no redirections on backend though :)
> >>
> >>
> >>ProxyPass /foo/bar https:/mydomain/foobar/
> >>ProxyPassReverse https:/mydomain/foobar/ /foo/bar
> >>
> >>I'll be more then glad to discuss it with you.
> >>
> >>
> >
> >So proxy should decrypt the stream, find URI, then encrypt it, and
> >pass it encrypted to backend ?


Igor Sysoev
http://sysoev.ru/en/

Re: mod_proxy reverse proxy optimization/performance question

Posted by Roman Gavrilov <ro...@aduva.com>.
No,  when https request gets to the server(apache), its being decrypted 
first then passed through apache routines, when it gets
to the proxy part the URI already decrypted. proxy in its turn issues a 
request to the backend https server and returns the answer to the client 
of course after caching it.

Roman

Igor Sysoev wrote:

>On Thu, 21 Oct 2004, Roman Gavrilov wrote:
>
>  
>
>>I don't see any problem using it, actually I am doing it. I am not
>>talking about proxying between http and https.
>>Mostly its used for mirroring (both frontend and backend use https only)
>>no redirections on backend though :)
>>
>>
>>ProxyPass /foo/bar https:/mydomain/foobar/
>>ProxyPassReverse https:/mydomain/foobar/ /foo/bar
>>
>>I'll be more then glad to discuss it with you.
>>    
>>
>
>So proxy should decrypt the stream, find URI, then encrypt it, and
>pass it encrypted to backend ?
>
>
>Igor Sysoev
>http://sysoev.ru/en/
>
>
>  
>

-- 
-------------------------------------------------------------
I am root. If you see me laughing... You better have a backup!




Re: mod_proxy reverse proxy optimization/performance question

Posted by Igor Sysoev <is...@rambler-co.ru>.
On Thu, 21 Oct 2004, Roman Gavrilov wrote:

> I don't see any problem using it, actually I am doing it. I am not
> talking about proxying between http and https.
> Mostly its used for mirroring (both frontend and backend use https only)
> no redirections on backend though :)
>
>
> ProxyPass /foo/bar https:/mydomain/foobar/
> ProxyPassReverse https:/mydomain/foobar/ /foo/bar
>
> I'll be more then glad to discuss it with you.

So proxy should decrypt the stream, find URI, then encrypt it, and
pass it encrypted to backend ?


Igor Sysoev
http://sysoev.ru/en/

Re: mod_proxy reverse proxy optimization/performance question

Posted by Roman Gavrilov <ro...@aduva.com>.
I don't see any problem using it, actually I am doing it. I am not 
talking about proxying between http and https.
Mostly its used for mirroring (both frontend and backend use https only) 
no redirections on backend though :)


ProxyPass /foo/bar https:/mydomain/foobar/
ProxyPassReverse https:/mydomain/foobar/ /foo/bar

I'll be more then glad to discuss it with you.

Regards
Roman


Igor Sysoev wrote:

>On Thu, 21 Oct 2004, Roman Gavrilov wrote:
>
>  
>
>>after checking the mod_accel I found out that it works only with http,
>>we need the cache & proxy  to work both with http and https.
>>What was the reason for disabling https proxying & caching ?
>>    
>>
>
>How do you think to do https reverse proxying ?
>
>
>Igor Sysoev
>http://sysoev.ru/en/
>
>
>  
>

-- 
-------------------------------------------------------------
I am root. If you see me laughing... You better have a backup!




Re: mod_proxy reverse proxy optimization/performance question

Posted by Igor Sysoev <is...@rambler-co.ru>.
On Thu, 21 Oct 2004, Roman Gavrilov wrote:

> after checking the mod_accel I found out that it works only with http,
> we need the cache & proxy  to work both with http and https.
> What was the reason for disabling https proxying & caching ?

How do you think to do https reverse proxying ?


Igor Sysoev
http://sysoev.ru/en/

Re: mod_proxy reverse proxy optimization/performance question

Posted by Roman Gavrilov <ro...@aduva.com>.
after checking the mod_accel I found out that it works only with http, 
we need the cache & proxy  to work both with http and https.
What was the reason for disabling https proxying & caching ?

Regards,
Roman

Igor Sysoev wrote:

>On Thu, 21 Oct 2004, Roman Gavrilov wrote:
>
>  
>
>>so what would you suggest I should do ?
>>implement it by myself ?
>>    
>>
>
>No, just look at http://sysoev.ru/mod_accel/
>It's Apache 1.3 module as you need.
>
>Igor Sysoev
>http://sysoev.ru/en/
>
>
>  
>
>>Bill Stoddard wrote:
>>
>>    
>>
>>>Graham Leggett wrote:
>>>
>>>      
>>>
>>>>Roman Gavrilov wrote:
>>>>
>>>>        
>>>>
>>>>>In my opinion it would be more efficient to let one process complete
>>>>>the request (using maximum line throughput) and return some busy
>>>>>code to other identical, simultaneous requests  until the file is
>>>>>cached locally.
>>>>>As anyone run into a similar situation? What solution did you find?
>>>>>          
>>>>>
>>>>In the original design for mod_cache, the second and subsequent
>>>>connections to a file that was still in the process of being
>>>>downloaded into the cache would shadow the cached file - in other
>>>>words it would serve content from the cached file as and when it was
>>>>received by the original request.
>>>>
>>>>The file in the cache was to be marked as "still busy downloading",
>>>>which meant threads/processes serving from the cached file would know
>>>>to keep trying to serve the cached file until the "still busy
>>>>downloading" status was cleared by the initial request. Timeouts
>>>>would sanity check the process.
>>>>
>>>>This prevents the "load spike" that occurs just after a file is
>>>>downloaded anew, but before that download is done.
>>>>
>>>>Whether this was implemented fully I am not sure - anyone?
>>>>        
>>>>
>>>It was never implemented.
>>>      
>>>
>
>
>  
>

-- 
-------------------------------------------------------------
I am root. If you see me laughing... You better have a backup!




Re: mod_proxy reverse proxy optimization/performance question

Posted by Igor Sysoev <is...@rambler-co.ru>.
On Thu, 21 Oct 2004, Roman Gavrilov wrote:

> so what would you suggest I should do ?
> implement it by myself ?

No, just look at http://sysoev.ru/mod_accel/
It's Apache 1.3 module as you need.

Igor Sysoev
http://sysoev.ru/en/


> Bill Stoddard wrote:
>
> > Graham Leggett wrote:
> >
> >> Roman Gavrilov wrote:
> >>
> >>> In my opinion it would be more efficient to let one process complete
> >>> the request (using maximum line throughput) and return some busy
> >>> code to other identical, simultaneous requests  until the file is
> >>> cached locally.
> >>> As anyone run into a similar situation? What solution did you find?
> >>
> >> In the original design for mod_cache, the second and subsequent
> >> connections to a file that was still in the process of being
> >> downloaded into the cache would shadow the cached file - in other
> >> words it would serve content from the cached file as and when it was
> >> received by the original request.
> >>
> >> The file in the cache was to be marked as "still busy downloading",
> >> which meant threads/processes serving from the cached file would know
> >> to keep trying to serve the cached file until the "still busy
> >> downloading" status was cleared by the initial request. Timeouts
> >> would sanity check the process.
> >>
> >> This prevents the "load spike" that occurs just after a file is
> >> downloaded anew, but before that download is done.
> >>
> >> Whether this was implemented fully I am not sure - anyone?
> >
> >
> > It was never implemented.

Re: mod_proxy reverse proxy optimization/performance question

Posted by Graham Leggett <mi...@sharp.fm>.
Roman Gavrilov wrote:

> so what would you suggest I should do ?
> implement it by myself ?

At the moment that's probably your best option.

Is this for Apache v1.3 or v2.0?

Regards,
Graham
--

Re: mod_proxy reverse proxy optimization/performance question

Posted by Roman Gavrilov <ro...@aduva.com>.
so what would you suggest I should do ?
implement it by myself ?


Bill Stoddard wrote:

> Graham Leggett wrote:
>
>> Roman Gavrilov wrote:
>>
>>> In my opinion it would be more efficient to let one process complete 
>>> the request (using maximum line throughput) and return some busy 
>>> code to other identical, simultaneous requests  until the file is 
>>> cached locally.
>>> As anyone run into a similar situation? What solution did you find?
>>
>>
>>
>> In the original design for mod_cache, the second and subsequent 
>> connections to a file that was still in the process of being 
>> downloaded into the cache would shadow the cached file - in other 
>> words it would serve content from the cached file as and when it was 
>> received by the original request.
>>
>> The file in the cache was to be marked as "still busy downloading", 
>> which meant threads/processes serving from the cached file would know 
>> to keep trying to serve the cached file until the "still busy 
>> downloading" status was cleared by the initial request. Timeouts 
>> would sanity check the process.
>>
>> This prevents the "load spike" that occurs just after a file is 
>> downloaded anew, but before that download is done.
>>
>> Whether this was implemented fully I am not sure - anyone?
>
>
> It was never implemented.
>
> Bill
>
>

-- 
-------------------------------------------------------------
I am root. If you see me laughing... You better have a backup!





Re: mod_proxy reverse proxy optimization/performance question

Posted by Bill Stoddard <bi...@wstoddard.com>.
Graham Leggett wrote:

> Roman Gavrilov wrote:
> 
>> In my opinion it would be more efficient to let one process complete 
>> the request (using maximum line throughput) and return some busy code 
>> to other identical, simultaneous requests  until the file is cached 
>> locally.
>> As anyone run into a similar situation? What solution did you find?
> 
> 
> In the original design for mod_cache, the second and subsequent 
> connections to a file that was still in the process of being downloaded 
> into the cache would shadow the cached file - in other words it would 
> serve content from the cached file as and when it was received by the 
> original request.
> 
> The file in the cache was to be marked as "still busy downloading", 
> which meant threads/processes serving from the cached file would know to 
> keep trying to serve the cached file until the "still busy downloading" 
> status was cleared by the initial request. Timeouts would sanity check 
> the process.
> 
> This prevents the "load spike" that occurs just after a file is 
> downloaded anew, but before that download is done.
> 
> Whether this was implemented fully I am not sure - anyone?

It was never implemented.

Bill


Re: mod_proxy reverse proxy optimization/performance question

Posted by Graham Leggett <mi...@sharp.fm>.
Roman Gavrilov wrote:

> In my opinion it would be more efficient to let one process complete the 
> request (using maximum line throughput) and return some busy code to 
> other identical, simultaneous requests  until the file is cached locally.
> As anyone run into a similar situation? What solution did you find?

In the original design for mod_cache, the second and subsequent 
connections to a file that was still in the process of being downloaded 
into the cache would shadow the cached file - in other words it would 
serve content from the cached file as and when it was received by the 
original request.

The file in the cache was to be marked as "still busy downloading", 
which meant threads/processes serving from the cached file would know to 
keep trying to serve the cached file until the "still busy downloading" 
status was cleared by the initial request. Timeouts would sanity check 
the process.

This prevents the "load spike" that occurs just after a file is 
downloaded anew, but before that download is done.

Whether this was implemented fully I am not sure - anyone?

Regards,
Graham
--