You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@geronimo.apache.org by Bill Stoddard <wg...@gmail.com> on 2009/01/12 20:40:50 UTC

G download stats

I'm sure most of you know about Vadim's Apache stats project for 
tracking download statistics for the verious Apache projects:

http://people.apache.org/~vgritsenko/stats/index.html

A fun little project but exceedingly difficult (not to mention time 
consuming) for Vadim to dig into the details of each project in order to 
present project stats with finer details.

Just out of curiosity, I did some Ruby hacking to modify Vadim's apache 
log mining script to filter out Geronimo project data with finer 
resolution.  Here are the results:

http://people.apache.org/~stoddard/stats/data/

I'll not bother commenting or summarizing on the different results 
because it's exciting in exactly the same way as watching paint dry.

The one item that might need a bit of explaining is the reference to 
'206W', so I'll cover that briefly...  A 'successful' reply to an HTTP 
Range request is a status '206'  response (see RFC 2616 if you want to 
know about range requests).   So the '206' in 206W refers to a 
successful reply to a Range request.  The 'W' means 'weighted'.... more 
on 'W' in a bit.

An example... if the size of a file to download is 100M, a client can 
make 10 range requests, each requesting a different 10MB segment of the 
file.  There are various reasons why a client might issue a range 
request (PDF, acrobat and similar viewers, high bandwidth but very low 
latency connections between the server and client and so forth. reason 
is not important to this explanation... ).  Each of the 10 Range 
requests will create a 206 reply entry in the web server's log file.  
So... if we are counting downloads of that 100MB file, it would be 
incorrect to count each 206 reply as a download.  The 'w', which stand 
for weighted... in this case, the '206W' download count would be '1'.  
The 10 206 replies are equivalent to 1 download of the 100 MB file.

fyi...

Bill

Re: G download stats

Posted by Kevan Miller <ke...@gmail.com>.
On Jan 12, 2009, at 2:40 PM, Bill Stoddard wrote:

> I'm sure most of you know about Vadim's Apache stats project for  
> tracking download statistics for the verious Apache projects:
>
> http://people.apache.org/~vgritsenko/stats/index.html
>
> A fun little project but exceedingly difficult (not to mention time  
> consuming) for Vadim to dig into the details of each project in  
> order to present project stats with finer details.
>
> Just out of curiosity, I did some Ruby hacking to modify Vadim's  
> apache log mining script to filter out Geronimo project data with  
> finer resolution.  Here are the results:
>
> http://people.apache.org/~stoddard/stats/data/

Cool. Thanks Bill!

>
>
> I'll not bother commenting or summarizing on the different results  
> because it's exciting in exactly the same way as watching paint dry.
>
> The one item that might need a bit of explaining is the reference to  
> '206W', so I'll cover that briefly...  A 'successful' reply to an  
> HTTP Range request is a status '206'  response (see RFC 2616 if you  
> want to know about range requests).   So the '206' in 206W refers to  
> a successful reply to a Range request.  The 'W' means 'weighted'....  
> more on 'W' in a bit.
>
> An example... if the size of a file to download is 100M, a client  
> can make 10 range requests, each requesting a different 10MB segment  
> of the file.  There are various reasons why a client might issue a  
> range request (PDF, acrobat and similar viewers, high bandwidth but  
> very low latency connections between the server and client and so  
> forth. reason is not important to this explanation... ).  Each of  
> the 10 Range requests will create a 206 reply entry in the web  
> server's log file.  So... if we are counting downloads of that 100MB  
> file, it would be incorrect to count each 206 reply as a download.   
> The 'w', which stand for weighted... in this case, the '206W'  
> download count would be '1'.  The 10 206 replies are equivalent to 1  
> download of the 100 MB file.
>
> fyi...
>
> Bill


Re: G download stats

Posted by Kevan Miller <ke...@gmail.com>.
On Jan 13, 2009, at 9:50 AM, Bill Stoddard wrote:

> Kevan Miller wrote:
>>
>> On Jan 12, 2009, at 2:40 PM, Bill Stoddard wrote:
>>
>>> I'm sure most of you know about Vadim's Apache stats project for  
>>> tracking download statistics for the verious Apache projects:
>>>
>>> http://people.apache.org/~vgritsenko/stats/index.html <http://people.apache.org/%7Evgritsenko/stats/index.html 
>>> >
>>>
>>> A fun little project but exceedingly difficult (not to mention  
>>> time consuming) for Vadim to dig into the details of each project  
>>> in order to present project stats with finer details.
>>>
>>> Just out of curiosity, I did some Ruby hacking to modify Vadim's  
>>> apache log mining script to filter out Geronimo project data with  
>>> finer resolution.  Here are the results:
>>>
>>> http://people.apache.org/~stoddard/stats/data/ <http://people.apache.org/%7Estoddard/stats/data/ 
>>> >
>>
>> BTW, if I understand what you are counting, then these statistics  
>> only represent some fraction of the actual downloads (i.e. the  
>> downloads from http://www.apache.org/dist/geronimo/ <http://www.apache.org/dist/geronimo/2.1.3/geronimo-tomcat6-javaee5-2.1.3-bin.zip 
>> > ) Downloads from the various Apache mirrors would not be  
>> counted... Wondering if hits to the mirroring system (e.g. http://www.apache.org/dyn/closer.cgi/geronimo/ 
>>  ) would yield a more accurate statistic.
>>
>> --kevan
> These stats include downloads redirected to the mirrors.   I do  
> count the redirects for download artifacts by closer.cgi.   As  
> expected, I have no way to determine if a redirect was successful...  
> very possible some fractions of hits are duplicates (i.e., a mirror  
> failed and the client came back for another mirror).

Cool. Thanks for the info.

--kevan

Re: G download stats

Posted by Bill Stoddard <wg...@gmail.com>.
Kevan Miller wrote:
>
> On Jan 12, 2009, at 2:40 PM, Bill Stoddard wrote:
>
>> I'm sure most of you know about Vadim's Apache stats project for 
>> tracking download statistics for the verious Apache projects:
>>
>> http://people.apache.org/~vgritsenko/stats/index.html 
>> <http://people.apache.org/%7Evgritsenko/stats/index.html>
>>
>> A fun little project but exceedingly difficult (not to mention time 
>> consuming) for Vadim to dig into the details of each project in order 
>> to present project stats with finer details.
>>
>> Just out of curiosity, I did some Ruby hacking to modify Vadim's 
>> apache log mining script to filter out Geronimo project data with 
>> finer resolution.  Here are the results:
>>
>> http://people.apache.org/~stoddard/stats/data/ 
>> <http://people.apache.org/%7Estoddard/stats/data/>
>
> BTW, if I understand what you are counting, then these statistics only 
> represent some fraction of the actual downloads (i.e. the downloads 
> from http://www.apache.org/dist/geronimo/ 
> <http://www.apache.org/dist/geronimo/2.1.3/geronimo-tomcat6-javaee5-2.1.3-bin.zip> ) 
> Downloads from the various Apache mirrors would not be counted... 
> Wondering if hits to the mirroring system 
> (e.g. http://www.apache.org/dyn/closer.cgi/geronimo/ ) would yield a 
> more accurate statistic.
>
> --kevan 
These stats include downloads redirected to the mirrors.   I do count 
the redirects for download artifacts by closer.cgi.   As expected, I 
have no way to determine if a redirect was successful... very possible 
some fractions of hits are duplicates (i.e., a mirror failed and the 
client came back for another mirror).

Bill

Re: G download stats

Posted by Kevan Miller <ke...@gmail.com>.
On Jan 12, 2009, at 2:40 PM, Bill Stoddard wrote:

> I'm sure most of you know about Vadim's Apache stats project for  
> tracking download statistics for the verious Apache projects:
>
> http://people.apache.org/~vgritsenko/stats/index.html
>
> A fun little project but exceedingly difficult (not to mention time  
> consuming) for Vadim to dig into the details of each project in  
> order to present project stats with finer details.
>
> Just out of curiosity, I did some Ruby hacking to modify Vadim's  
> apache log mining script to filter out Geronimo project data with  
> finer resolution.  Here are the results:
>
> http://people.apache.org/~stoddard/stats/data/

BTW, if I understand what you are counting, then these statistics only  
represent some fraction of the actual downloads (i.e. the downloads  
from http://www.apache.org/dist/geronimo/ ) Downloads from the various  
Apache mirrors would not be counted... Wondering if hits to the  
mirroring system (e.g. http://www.apache.org/dyn/closer.cgi/ 
geronimo/ ) would yield a more accurate statistic.

--kevan