You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openoffice.apache.org by Rob Weir <ro...@apache.org> on 2012/05/11 05:06:46 UTC

Download stats script (in progress)

SourceForge has a nice REST API to query for download stats and return
them in JSON objections.  Unfortunately, our directory structure for
AOO 3.4 is rather odd, with English downloads in one place,
translations in another directory, and mixing hashes, installs and
languages packs altogether.  So getting these stats is a little
painful.  You can't just get the numbers of a single directory and be
done.  It is more complicated than that.

Also, the SF API seems to be rate limited, or at least I'm getting
errors if I query it too much.  That's understandable.

So.... I'm coding a simple download stats app, in python, that will
collect together all the relevant stats and produce reports.  It
caches on disk JSON objects that have already been retrieved, which
eliminates the throttling issues as well as greatly improves
performance.

Not quite done, but I'll check it in (where?) when it is fully
debugged and validated.  My goal is to have solid numbers for the one
week mark next Tuesday.  And from what I'm seeing so far, the numbers
will be amazing.

But two quick questions to help me finish this:

1) Historically, what did OOo report as "downloads"?  Was this just a
count of full installs?  Or language packs as well?

2) It is easy to produce downloads by language and platform, since our
installs are already defined that way.  But I can also report
per-country.  Is that interesting to anyone?   For example, in Canada,
the most popular downloads are X, Y, Z.


-Rob

Re: Download stats script (in progress)

Posted by "Marcus (OOo)" <ma...@wtnet.de>.
Am 05/11/2012 05:06 AM, schrieb Rob Weir:
> SourceForge has a nice REST API to query for download stats and return
> them in JSON objections.  Unfortunately, our directory structure for
> AOO 3.4 is rather odd, with English downloads in one place,
> translations in another directory, and mixing hashes, installs and
> languages packs altogether.  So getting these stats is a little
> painful.  You can't just get the numbers of a single directory and be
> done.  It is more complicated than that.
>
> Also, the SF API seems to be rate limited, or at least I'm getting
> errors if I query it too much.  That's understandable.
>
> So.... I'm coding a simple download stats app, in python, that will
> collect together all the relevant stats and produce reports.  It
> caches on disk JSON objects that have already been retrieved, which
> eliminates the throttling issues as well as greatly improves
> performance.
>
> Not quite done, but I'll check it in (where?) when it is fully
> debugged and validated.  My goal is to have solid numbers for the one
> week mark next Tuesday.  And from what I'm seeing so far, the numbers
> will be amazing.
>
> But two quick questions to help me finish this:
>
> 1) Historically, what did OOo report as "downloads"?  Was this just a
> count of full installs?  Or language packs as well?

We had binaries, SDK and source counted separately. IMHO langpacks were 
not counted.

But could be now a nice additional number. Just to see how popular these 
are at all. ;-)

> 2) It is easy to produce downloads by language and platform, since our
> installs are already defined that way.  But I can also report
> per-country.  Is that interesting to anyone?   For example, in Canada,
> the most popular downloads are X, Y, Z.

Sure, why not. Then we can see which country downloads a language 
version because their native language is not yet supported. Then we can 
put the focus on these languages; given that we have the committers and 
translators for them.

Marcus


Re: Download stats script (in progress)

Posted by Rob Weir <ro...@apache.org>.
On Fri, May 11, 2012 at 12:28 AM, Dave Fisher <da...@comcast.net> wrote:
>
> On May 10, 2012, at 9:10 PM, Juergen Schmidt wrote:
>
>> On Friday, 11. May 2012 at 05:06, Rob Weir wrote:
>>> SourceForge has a nice REST API to query for download stats and return
>>> them in JSON objections. Unfortunately, our directory structure for
>>> AOO 3.4 is rather odd, with English downloads in one place,
>>> translations in another directory, and mixing hashes, installs and
>>> languages packs altogether. So getting these stats is a little
>>> painful. You can't just get the numbers of a single directory and be
>>> done. It is more complicated than that.
>>>
>>>
>>
>> I noticed this as well and I have also thought about a script or app to collect them ;-) good that you already have started...
>>>
>>> Also, the SF API seems to be rate limited, or at least I'm getting
>>> errors if I query it too much. That's understandable.
>>>
>>> So.... I'm coding a simple download stats app, in python, that will
>>> collect together all the relevant stats and produce reports. It
>>> caches on disk JSON objects that have already been retrieved, which
>>> eliminates the throttling issues as well as greatly improves
>>> performance.
>>>
>>> Not quite done, but I'll check it in (where?)
>> mmh good question,
>
> https://svn.apache.org/repos/asf/incubator/ooo/ooo-site/trunk/tools/.
>
>> Maybe we can integrate a download counter in the webpage. Something that gets automatically updated hourly or twice a day.
>
> We should be able to script publishing of the downloads (or any other) page every hour. This is done for www.apache.org/. Infra will know the details.
>

One approach is to have the python script produce a file like
aoo-downloads.js containing summary data in the form of a JSON object.
 That can be imported into any HTML page and then with some simple
scripting can be displayed on any page.

There is a really good timeline widget  here I've been meaning to
explore some day.  I think this would be very cool:

http://www.simile-widgets.org/timeplot/

-Rob

> Putting the script in tools makes it accessible. If trunk/bin is more common then that instead.
>
> Regards,
> Dave
>
>>
>>> when it is fully
>>> debugged and validated. My goal is to have solid numbers for the one
>>> week mark next Tuesday. And from what I'm seeing so far, the numbers
>>> will be amazing.
>>>
>>>
>>
>>>
>>> But two quick questions to help me finish this:
>>>
>>> 1) Historically, what did OOo report as "downloads"? Was this just a
>>> count of full installs? Or language packs as well?
>>>
>>>
>>
>> I don't know but I assume full install sets . I would like to detailed numbers as much as possible.
>>>
>>> 2) It is easy to produce downloads by language and platform, since our
>>> installs are already defined that way. But I can also report
>>> per-country. Is that interesting to anyone? For example, in Canada,
>>> the most popular downloads are X, Y, Z.
>>>
>>>
>>
>> again I would like to have detailed numbers. We can produce nice statistics and graphs ;-)
>>
>> Juergen
>>>
>>>
>>> -Rob
>>
>

Re: Download stats script (in progress)

Posted by Dave Fisher <da...@comcast.net>.
On May 10, 2012, at 9:10 PM, Juergen Schmidt wrote:

> On Friday, 11. May 2012 at 05:06, Rob Weir wrote:
>> SourceForge has a nice REST API to query for download stats and return
>> them in JSON objections. Unfortunately, our directory structure for
>> AOO 3.4 is rather odd, with English downloads in one place,
>> translations in another directory, and mixing hashes, installs and
>> languages packs altogether. So getting these stats is a little
>> painful. You can't just get the numbers of a single directory and be
>> done. It is more complicated than that.
>> 
>> 
> 
> I noticed this as well and I have also thought about a script or app to collect them ;-) good that you already have started... 
>> 
>> Also, the SF API seems to be rate limited, or at least I'm getting
>> errors if I query it too much. That's understandable.
>> 
>> So.... I'm coding a simple download stats app, in python, that will
>> collect together all the relevant stats and produce reports. It
>> caches on disk JSON objects that have already been retrieved, which
>> eliminates the throttling issues as well as greatly improves
>> performance.
>> 
>> Not quite done, but I'll check it in (where?) 
> mmh good question,

https://svn.apache.org/repos/asf/incubator/ooo/ooo-site/trunk/tools/.

> Maybe we can integrate a download counter in the webpage. Something that gets automatically updated hourly or twice a day.

We should be able to script publishing of the downloads (or any other) page every hour. This is done for www.apache.org/. Infra will know the details.

Putting the script in tools makes it accessible. If trunk/bin is more common then that instead.

Regards,
Dave

> 
>> when it is fully
>> debugged and validated. My goal is to have solid numbers for the one
>> week mark next Tuesday. And from what I'm seeing so far, the numbers
>> will be amazing. 
>> 
>> 
> 
>> 
>> But two quick questions to help me finish this:
>> 
>> 1) Historically, what did OOo report as "downloads"? Was this just a
>> count of full installs? Or language packs as well?
>> 
>> 
> 
> I don't know but I assume full install sets . I would like to detailed numbers as much as possible.
>> 
>> 2) It is easy to produce downloads by language and platform, since our
>> installs are already defined that way. But I can also report
>> per-country. Is that interesting to anyone? For example, in Canada,
>> the most popular downloads are X, Y, Z.
>> 
>> 
> 
> again I would like to have detailed numbers. We can produce nice statistics and graphs ;-) 
> 
> Juergen
>> 
>> 
>> -Rob 
> 


Re: Download stats script (in progress)

Posted by Louis Suárez-Potts <lu...@gmail.com>.
Juergen Schmidt wrote:
>  don't know but I assume full install sets . I would like to detailed numbers as much as possible.
>> > 
>> > 2) It is easy to produce downloads by language and platform, since our
>> > installs are already defined that way. But I can also report
>> > per-country. Is that interesting to anyone? For example, in Canada,
>> > the most popular downloads are X, Y, Z.
>> > 
>> > 
> 
> again I would like to have detailed numbers. We can produce nice statistics and graphs ;-) 

Juergen,

Would you want those from OOo or current? I presume current, and we can
even make these accurate. I should think that for OOo, your best bet
really is to look to the DE project's, BR-PT's, ES, if they have
them--Alexandro might?, or Richard Holt, or others in Red.es or
Cenatic--and PLIO, for Italy.

(Other locations and languages would also be obtainable, I'd guess, but
... why?)

Maho might also have data still for JA, which usually demonstrated
itself to be immensely into downloading and using and doing good work
with OOo. :-)

Finally, we all do need to keep in mind the simple fact that those with
Windows usually will have to download OO, but those with Linux... oh,
wait. My, what an interesting new situation. Even so for Mac.

:-)

Louis

=

Louis Suárez-Potts, PhD
President, Age of Peers, Inc.

Re: Download stats script (in progress)

Posted by Juergen Schmidt <jo...@googlemail.com>.
On Friday, 11. May 2012 at 05:06, Rob Weir wrote:
> SourceForge has a nice REST API to query for download stats and return
> them in JSON objections. Unfortunately, our directory structure for
> AOO 3.4 is rather odd, with English downloads in one place,
> translations in another directory, and mixing hashes, installs and
> languages packs altogether. So getting these stats is a little
> painful. You can't just get the numbers of a single directory and be
> done. It is more complicated than that.
> 
> 

I noticed this as well and I have also thought about a script or app to collect them ;-) good that you already have started... 
> 
> Also, the SF API seems to be rate limited, or at least I'm getting
> errors if I query it too much. That's understandable.
> 
> So.... I'm coding a simple download stats app, in python, that will
> collect together all the relevant stats and produce reports. It
> caches on disk JSON objects that have already been retrieved, which
> eliminates the throttling issues as well as greatly improves
> performance.
> 
> Not quite done, but I'll check it in (where?) 
mmh good question,

Maybe we can integrate a download counter in the webpage. Something that gets automatically updated hourly or twice a day.
 
> when it is fully
> debugged and validated. My goal is to have solid numbers for the one
> week mark next Tuesday. And from what I'm seeing so far, the numbers
> will be amazing. 
> 
> 

> 
> But two quick questions to help me finish this:
> 
> 1) Historically, what did OOo report as "downloads"? Was this just a
> count of full installs? Or language packs as well?
> 
> 

I don't know but I assume full install sets . I would like to detailed numbers as much as possible.
> 
> 2) It is easy to produce downloads by language and platform, since our
> installs are already defined that way. But I can also report
> per-country. Is that interesting to anyone? For example, in Canada,
> the most popular downloads are X, Y, Z.
> 
> 

again I would like to have detailed numbers. We can produce nice statistics and graphs ;-) 

Juergen
> 
> 
> -Rob 


Re: Download stats script (in progress)

Posted by Rob Weir <ro...@apache.org>.
On Fri, May 11, 2012 at 2:58 PM, Roberto Galoppini <rg...@geek.net> wrote:
> On Fri, May 11, 2012 at 5:06 AM, Rob Weir <ro...@apache.org> wrote:
>
>> SourceForge has a nice REST API to query for download stats and return
>> them in JSON objections.  Unfortunately, our directory structure for
>> AOO 3.4 is rather odd, with English downloads in one place,
>> translations in another directory, and mixing hashes, installs and
>> languages packs altogether.  So getting these stats is a little
>> painful.  You can't just get the numbers of a single directory and be
>> done.  It is more complicated than that.
>>
>> Also, the SF API seems to be rate limited, or at least I'm getting
>> errors if I query it too much.  That's understandable.
>>
>
> Rob, can provide me with more info about this, so that we can investigate
> it further?


Hi Roberto,

At the python level the error is on a urllib.urlopen().read(), with an
returned error of:

"IOError: [Errno socket error] [Errno 10054] An existing connection
was forcibly closed by the remote host"

I find that this happens when I make many (> 50) requests in a short
period of time (1 or 2 minutes).  My solution right now is to maintain
a local disk cache of prior returned results.  Not only does this
reduce the number of requests I send SF, but it also improves the
speed of the report generation.

In any case, I'm happy with the caching solution, so this is not a
blocking issue for me right now.

-Rob

> We plan to share some stats figures next week, as we did previously for
> Extensions and Templates.
>
>
> Thanks,
>
> Roberto
>
>
>>
>> So.... I'm coding a simple download stats app, in python, that will
>> collect together all the relevant stats and produce reports.  It
>> caches on disk JSON objects that have already been retrieved, which
>> eliminates the throttling issues as well as greatly improves
>> performance.
>>
>> Not quite done, but I'll check it in (where?) when it is fully
>> debugged and validated.  My goal is to have solid numbers for the one
>> week mark next Tuesday.  And from what I'm seeing so far, the numbers
>> will be amazing.
>>
>> But two quick questions to help me finish this:
>>
>> 1) Historically, what did OOo report as "downloads"?  Was this just a
>> count of full installs?  Or language packs as well?
>>
>> 2) It is easy to produce downloads by language and platform, since our
>> installs are already defined that way.  But I can also report
>> per-country.  Is that interesting to anyone?   For example, in Canada,
>> the most popular downloads are X, Y, Z.
>>
>>
>> -Rob
>>
>
> --
> ====
> This e- mail message is intended only for the named recipient(s) above. It
> may contain confidential and privileged information. If you are not the
> intended recipient you are hereby notified that any dissemination,
> distribution or copying of this e-mail and any attachment(s) is strictly
> prohibited. If you have received this e-mail in error, please immediately
> notify the sender by replying to this e-mail and delete the message and any
> attachment(s) from your system. Thank you.
>

Re: Download stats script (in progress)

Posted by Roberto Galoppini <rg...@geek.net>.
On Fri, May 11, 2012 at 5:06 AM, Rob Weir <ro...@apache.org> wrote:

> SourceForge has a nice REST API to query for download stats and return
> them in JSON objections.  Unfortunately, our directory structure for
> AOO 3.4 is rather odd, with English downloads in one place,
> translations in another directory, and mixing hashes, installs and
> languages packs altogether.  So getting these stats is a little
> painful.  You can't just get the numbers of a single directory and be
> done.  It is more complicated than that.
>
> Also, the SF API seems to be rate limited, or at least I'm getting
> errors if I query it too much.  That's understandable.
>

Rob, can provide me with more info about this, so that we can investigate
it further?
We plan to share some stats figures next week, as we did previously for
Extensions and Templates.


Thanks,

Roberto


>
> So.... I'm coding a simple download stats app, in python, that will
> collect together all the relevant stats and produce reports.  It
> caches on disk JSON objects that have already been retrieved, which
> eliminates the throttling issues as well as greatly improves
> performance.
>
> Not quite done, but I'll check it in (where?) when it is fully
> debugged and validated.  My goal is to have solid numbers for the one
> week mark next Tuesday.  And from what I'm seeing so far, the numbers
> will be amazing.
>
> But two quick questions to help me finish this:
>
> 1) Historically, what did OOo report as "downloads"?  Was this just a
> count of full installs?  Or language packs as well?
>
> 2) It is easy to produce downloads by language and platform, since our
> installs are already defined that way.  But I can also report
> per-country.  Is that interesting to anyone?   For example, in Canada,
> the most popular downloads are X, Y, Z.
>
>
> -Rob
>

-- 
====
This e- mail message is intended only for the named recipient(s) above. It 
may contain confidential and privileged information. If you are not the 
intended recipient you are hereby notified that any dissemination, 
distribution or copying of this e-mail and any attachment(s) is strictly 
prohibited. If you have received this e-mail in error, please immediately 
notify the sender by replying to this e-mail and delete the message and any 
attachment(s) from your system. Thank you.


Re: Download stats script (in progress)

Posted by Louis Suárez-Potts <lu...@gmail.com>.
Rob Weir wrote:
> But two quick questions to help me finish this:
> 
> 1) Historically, what did OOo report as "downloads"?  Was this just a
> count of full installs?  Or language packs as well?

History evolved. The data deemed "download" reflected, in fact, *hits*
to the relevant pages, at first, then clicks on the links. In the last
several years, the data collected was more precise but it generally
referred to specific installation sets clicked on for download. As the
NL projects supplemented (usually) the L10n modules by providing more QA
and installation sets, the language packs as such, if I recall, grew
less urgent. Ie, why have a language pack when I could download the
ZH-TW version of OOo?

However..... in the earlier days, when we actually were counting as many
downloads as possible (and it was an inverse Red Queen's Race).... all
counted, and that meant that some things were counted more than once but
seldom more than twice, and not all things were so honoured.

So. With Bouncer and with other tools we did have a good but not
plusgood and certainly never a doubleplusgood accounting. But it was
good enough for propaganda :-).

What did in the end make the final tally were indexes of ODF use.

> 
> 2) It is easy to produce downloads by language and platform, since our
> installs are already defined that way.  But I can also report
> per-country.  Is that interesting to anyone?  

Yes.

 For example, in Canada,
> the most popular downloads are X, Y, Z.

Thanks, Rob.
Yes, the per country index was immensely desired, as it provided usually
positive feedback and thus encouragement to those who were a)
volunteering mirrors or effort or other things of immense value (first
borns?) to the cause, and b) it demonstrated to those funding these free
efforts the international value of their work, even though brand
awareness (ie, what server you use to get you the fee software) was
nonexistent.

But those who managed the servers and did the immensely important work
of keeping things current... knowing where it was used was important.

I also found it important, as it helped me think of ways in which we
could manage the OOoCons without going through the easily-gamed system
we had relied upon.

Sorry for prolixity--
Louis

-- 

Louis Suárez-Potts, PhD
President, Age of Peers, Inc.

+1.416.625.3843 (m)
@luispo
GTalk: luispo@gmail.com
Skype: louisiam
@luispo
Blog 1: newspeak
Blog 2: Open Source Action (and more)