You are viewing a plain text version of this content. The canonical link for it is here.
Posted to repository@apache.org by "Mark R. Diggory" <md...@latte.harvard.edu> on 2004/07/14 16:48:24 UTC

ASF Repository, closer.cgi and Depot

Sorry for the cross post but this seems relevant to both these groups.

I was thinking about the subject of mirroring and redirection for the 
ASF Repository. Currently, there was some discussion on the Depot list 
concerning this. I feel we could address this subject again for both 
groups interest.

www.apache.org/dyn/closer cgi provides a simple resolution strategy to 
attempt to determine the closest mirror available to the client browser. 
It then generates an html page via a template that lists the selected 
mirror as well as other available mirrors. With Depot, we have a 
customized download client that could be extended to manage downloading 
from a list of mirrors as well.

Here are my thoughts on this subject:

A.) This script is really not that big (90% of it is just parsing the 
mirrors file), and the database (a flat text file called mirrors.list) 
as well is not very big. While closer.cgi is a neat service for 
browsers. Its not exactly helpful for automated clients. Yet, 
mirrors.list is an excellent example of metadata that is exposed in a 
effective manner such that automated clients can access it.

http://www.apache.org/mirrors/mirrors.list

I'm somewhat convinced that a it would be simple to create a client 
implementation which accomplished the same functionality as closer.cgi 
programatically so that it could be used in terms of resolving a 
location to download from when mirrors are available.

This would be beneficial to the Apache Bandwidth issue in that if a 
client such as Depot/DownloadManager managed the same capability as 
closer.cgi then:

1.) to determine if the list file has been updated, all one needs to do 
is a head request on the file and review the lastModified date, 
downloading it if it is newer than the client local copy.

2.) Apache server cpu time is spent parsing this file for each 
"closer.cgi" request on the server side, instead the client spends the 
cpu time doing this calculation. After the intial head request to check 
when the mirror list was last updated, no other requests occur to 
www.apache.org in the download process.

B.) Downfalls?

1.) If such a service were server-side, we do get a centralized way of 
managing it.

But its difficult to control http client behavior from the server 
outside of the most simplistic of "http redirects", the cost of 
downloading a file becomes much greater in that each download request 
has to be redirected through closer.cgi.

2.) Statistics: I guess the benefit that I do see is that one could log 
requests through closer.cgi to track download statistics.

But these again would only be "partial stats" because any browser can 
simply bookmark a mirror and go to it directly. It seems more 
appropriate that a "download stats" tool would operate more behind the 
scenes of all the mirrors and be aggrigated across all the mirrors to 
gain more accuracy in such statistics.


Cheers,
-Mark


Re: ASF Repository, closer.cgi and Depot

Posted by Nick Chalko <ni...@chalko.com>.
Mark R. Diggory wrote:

>
> But then this becomes a project spanning both the Repository group and 
> the various clients out there "Depot/Maven/etc". And agreement on the 
> GEO_IP request protocol and xml format etc becomes a touchy subject 
> don't they?
>
> -Mark


True, but at this point I would just get some format any format out 
there let some usage get started and then we can agree on the "correct" 
format.
I don't want to stagnate waiting on permission.

R,
Nick


Re: ASF Repository, closer.cgi and Depot

Posted by Erik Abele <er...@codefaktor.de>.
On 14.07.2004, at 19:57, Mark R. Diggory wrote:

> But then this becomes a project spanning both the Repository group and 
> the various clients out there "Depot/Maven/etc". And agreement on the 
> GEO_IP request protocol and xml format etc becomes a touchy subject 
> don't they?

Maybe but as I understood it, that's exactly the task assigned to the 
repository@ list, no? Setting up a basic repository framework on which 
the other projects can build their tools.

Well, spec work gets always a bit touchy ;-)

Cheers,
Erik

> -Mark

Re: ASF Repository, closer.cgi and Depot

Posted by Erik Abele <er...@codefaktor.de>.
On 14.07.2004, at 19:57, Mark R. Diggory wrote:

> But then this becomes a project spanning both the Repository group and 
> the various clients out there "Depot/Maven/etc". And agreement on the 
> GEO_IP request protocol and xml format etc becomes a touchy subject 
> don't they?

Maybe but as I understood it, that's exactly the task assigned to the 
repository@ list, no? Setting up a basic repository framework on which 
the other projects can build their tools.

Well, spec work gets always a bit touchy ;-)

Cheers,
Erik

> -Mark

Re: ASF Repository, closer.cgi and Depot

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Erik Abele wrote:

>> I suspect their views would include what you suggest, that 
>> distribution might
>> save some nomimal (c.f. artifact sizes) bandwidth savings & give some 
>> CPU
>> saving, but it'd be at significant loss of 'control' (of well behaved
>> clients). Central control over this seems the most appealing.
>
>
> Agreed.
>
>> Since I doubt the CPU cycles are worth saving (or the script would've 
>> been
>> optimised), could we not just change the script to check for some header
>> from the client, and return XML or some structured text, for non-human
>> browsers. [BTW: viewcvs seems to do this nicely, returning the file if
>> non-human and the presentation is human (as browser identifies).
>
>
> This sounds promising. You have central control, you get the 
> geoip-mapping stuff for free and the CPU cycles as well as the 
> bandwidth for (XML-ized) responses are a no-brainer in this case.


But then this becomes a project spanning both the Repository group and 
the various clients out there "Depot/Maven/etc". And agreement on the 
GEO_IP request protocol and xml format etc becomes a touchy subject 
don't they?

-Mark

Re: ASF Repository, closer.cgi and Depot

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Erik Abele wrote:

>> I suspect their views would include what you suggest, that 
>> distribution might
>> save some nomimal (c.f. artifact sizes) bandwidth savings & give some 
>> CPU
>> saving, but it'd be at significant loss of 'control' (of well behaved
>> clients). Central control over this seems the most appealing.
>
>
> Agreed.
>
>> Since I doubt the CPU cycles are worth saving (or the script would've 
>> been
>> optimised), could we not just change the script to check for some header
>> from the client, and return XML or some structured text, for non-human
>> browsers. [BTW: viewcvs seems to do this nicely, returning the file if
>> non-human and the presentation is human (as browser identifies).
>
>
> This sounds promising. You have central control, you get the 
> geoip-mapping stuff for free and the CPU cycles as well as the 
> bandwidth for (XML-ized) responses are a no-brainer in this case.


But then this becomes a project spanning both the Repository group and 
the various clients out there "Depot/Maven/etc". And agreement on the 
GEO_IP request protocol and xml format etc becomes a touchy subject 
don't they?

-Mark

Re: ASF Repository, closer.cgi and Depot

Posted by Erik Abele <er...@codefaktor.de>.
On 14.07.2004, at 17:40, Adam R. B. Jack wrote:

> ----- Original Message -----
> From: "Mark R. Diggory" <md...@latte.harvard.edu>
> To: <re...@apache.org>; <de...@incubator.apache.org>
> Sent: Wednesday, July 14, 2004 8:48 AM
> Subject: ASF Repository, closer.cgi and Depot
>
>
>> ...snipped a lot of good points...
>
> Hmm, it seems to me that infra@ or mirrors@ (is that a list) probably 
> have
> views on this. (But then, we probably don't want 4 lists on here. :-)

mirrors@ is a list but mainly for mirror maintainers. There's no one on 
mirrors@ who isn't sub'ed to infra....

> I suspect their views would include what you suggest, that 
> distribution might
> save some nomimal (c.f. artifact sizes) bandwidth savings & give some 
> CPU
> saving, but it'd be at significant loss of 'control' (of well behaved
> clients). Central control over this seems the most appealing.

Agreed.

> Since I doubt the CPU cycles are worth saving (or the script would've 
> been
> optimised), could we not just change the script to check for some 
> header
> from the client, and return XML or some structured text, for non-human
> browsers. [BTW: viewcvs seems to do this nicely, returning the file if
> non-human and the presentation is human (as browser identifies).

This sounds promising. You have central control, you get the 
geoip-mapping stuff for free and the CPU cycles as well as the 
bandwidth for (XML-ized) responses are a no-brainer in this case.

just my 2 cents...

Cheers,
Erik

> regards,
> Adam

Re: ASF Repository, closer.cgi and Depot

Posted by Erik Abele <er...@codefaktor.de>.
On 14.07.2004, at 17:40, Adam R. B. Jack wrote:

> ----- Original Message -----
> From: "Mark R. Diggory" <md...@latte.harvard.edu>
> To: <re...@apache.org>; <de...@incubator.apache.org>
> Sent: Wednesday, July 14, 2004 8:48 AM
> Subject: ASF Repository, closer.cgi and Depot
>
>
>> ...snipped a lot of good points...
>
> Hmm, it seems to me that infra@ or mirrors@ (is that a list) probably 
> have
> views on this. (But then, we probably don't want 4 lists on here. :-)

mirrors@ is a list but mainly for mirror maintainers. There's no one on 
mirrors@ who isn't sub'ed to infra....

> I suspect their views would include what you suggest, that 
> distribution might
> save some nomimal (c.f. artifact sizes) bandwidth savings & give some 
> CPU
> saving, but it'd be at significant loss of 'control' (of well behaved
> clients). Central control over this seems the most appealing.

Agreed.

> Since I doubt the CPU cycles are worth saving (or the script would've 
> been
> optimised), could we not just change the script to check for some 
> header
> from the client, and return XML or some structured text, for non-human
> browsers. [BTW: viewcvs seems to do this nicely, returning the file if
> non-human and the presentation is human (as browser identifies).

This sounds promising. You have central control, you get the 
geoip-mapping stuff for free and the CPU cycles as well as the 
bandwidth for (XML-ized) responses are a no-brainer in this case.

just my 2 cents...

Cheers,
Erik

> regards,
> Adam

Re: ASF Repository, closer.cgi and Depot

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.
----- Original Message ----- 
From: "Mark R. Diggory" <md...@latte.harvard.edu>
To: <re...@apache.org>; <de...@incubator.apache.org>
Sent: Wednesday, July 14, 2004 8:48 AM
Subject: ASF Repository, closer.cgi and Depot


> Sorry for the cross post but this seems relevant to both these groups.
>
> I was thinking about the subject of mirroring and redirection for the
> ASF Repository. Currently, there was some discussion on the Depot list
> concerning this. I feel we could address this subject again for both
> groups interest.
>
> www.apache.org/dyn/closer cgi provides a simple resolution strategy to
> attempt to determine the closest mirror available to the client browser.
> It then generates an html page via a template that lists the selected
> mirror as well as other available mirrors. With Depot, we have a
> customized download client that could be extended to manage downloading
> from a list of mirrors as well.
>
> Here are my thoughts on this subject:
>
> A.) This script is really not that big (90% of it is just parsing the
> mirrors file), and the database (a flat text file called mirrors.list)
> as well is not very big. While closer.cgi is a neat service for
> browsers. Its not exactly helpful for automated clients. Yet,
> mirrors.list is an excellent example of metadata that is exposed in a
> effective manner such that automated clients can access it.
>
> http://www.apache.org/mirrors/mirrors.list
>
> I'm somewhat convinced that a it would be simple to create a client
> implementation which accomplished the same functionality as closer.cgi
> programatically so that it could be used in terms of resolving a
> location to download from when mirrors are available.
>
> This would be beneficial to the Apache Bandwidth issue in that if a
> client such as Depot/DownloadManager managed the same capability as
> closer.cgi then:

Hmm, it seems to me that infra@ or mirrors@ (is that a list) probably have
views on this. (But then, we probably don't want 4 lists on here. :-) I
suspect their views would include what you suggest, that distribution might
save some nomimal (c.f. artifact sizes) bandwidth savings & give some CPU
saving, but it'd be at significant loss of 'control' (of well behaved
clients). Central control over this seems the most appealing.

Since I doubt the CPU cycles are worth saving (or the script would've been
optimised), could we not just change the script to check for some header
from the client, and return XML or some structured text, for non-human
browsers. [BTW: viewcvs seems to do this nicely, returning the file if
non-human and the presentation is human (as browser identifies).

regards,

Adam


Re: ASF Repository, closer.cgi and Depot

Posted by "Adam R. B. Jack" <aj...@trysybase.com>.
----- Original Message ----- 
From: "Mark R. Diggory" <md...@latte.harvard.edu>
To: <re...@apache.org>; <de...@incubator.apache.org>
Sent: Wednesday, July 14, 2004 8:48 AM
Subject: ASF Repository, closer.cgi and Depot


> Sorry for the cross post but this seems relevant to both these groups.
>
> I was thinking about the subject of mirroring and redirection for the
> ASF Repository. Currently, there was some discussion on the Depot list
> concerning this. I feel we could address this subject again for both
> groups interest.
>
> www.apache.org/dyn/closer cgi provides a simple resolution strategy to
> attempt to determine the closest mirror available to the client browser.
> It then generates an html page via a template that lists the selected
> mirror as well as other available mirrors. With Depot, we have a
> customized download client that could be extended to manage downloading
> from a list of mirrors as well.
>
> Here are my thoughts on this subject:
>
> A.) This script is really not that big (90% of it is just parsing the
> mirrors file), and the database (a flat text file called mirrors.list)
> as well is not very big. While closer.cgi is a neat service for
> browsers. Its not exactly helpful for automated clients. Yet,
> mirrors.list is an excellent example of metadata that is exposed in a
> effective manner such that automated clients can access it.
>
> http://www.apache.org/mirrors/mirrors.list
>
> I'm somewhat convinced that a it would be simple to create a client
> implementation which accomplished the same functionality as closer.cgi
> programatically so that it could be used in terms of resolving a
> location to download from when mirrors are available.
>
> This would be beneficial to the Apache Bandwidth issue in that if a
> client such as Depot/DownloadManager managed the same capability as
> closer.cgi then:

Hmm, it seems to me that infra@ or mirrors@ (is that a list) probably have
views on this. (But then, we probably don't want 4 lists on here. :-) I
suspect their views would include what you suggest, that distribution might
save some nomimal (c.f. artifact sizes) bandwidth savings & give some CPU
saving, but it'd be at significant loss of 'control' (of well behaved
clients). Central control over this seems the most appealing.

Since I doubt the CPU cycles are worth saving (or the script would've been
optimised), could we not just change the script to check for some header
from the client, and return XML or some structured text, for non-human
browsers. [BTW: viewcvs seems to do this nicely, returning the file if
non-human and the presentation is human (as browser identifies).

regards,

Adam