You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@community.apache.org by Daniel Gruno <hu...@apache.org> on 2016/10/26 11:07:37 UTC

Adding some statistics to projects.a.o?

Hi folks,
I was wondering, since we have full access to Snoot for the ASF, why not
take advantage of that and add a statistics page to projects.apache.org,
showing the various live stats available (no. of commits/committers,
largest repos by size/commits, proper language breakdown, relationship
mapping, mail stats etc).

I was inclined to JFDI, but I'd love to hear what others think about
this. If I don't hear any loud objections, I'll add a stats page today,
and we can see if it's of any use :)

Comments? Suggestions? :)

With regards,
Daniel.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Hervé BOUTEMY <he...@free.fr>.
Le vendredi 28 octobre 2016, 10:27:31 CEST Daniel Gruno a écrit :
> > I have a snoot account, then I could have a look at the list of repos that
> > are taken into account. I have a few questions:
> > 1. can we show the list of repos from this statistics page?
> 
> Do you mean the entire list of repos analysed? I'm sure we could, I just
> don't quite know how we'd present it :) It's a rather large list.
yes, I mean the full list (740 repositories, Snoot says :) )
I know that making it appealing will require some work, but I think that it's 
important to make that information visible for people wanting to dig into 
these stats, and even help us fix issues on Snoot config

I'm even sure we could try to add this list of repos split by committee, on 
each committee page: that would make each PMC able to see its Snort config and 
once again help to fix issues

> 
> > 2. I saw that some imports are failing, because list of repos change over
> > time: how can I help fix issues?
> 
> If you're up for keeping the list updated, speak to Sally about getting
> admin privs on Snoot, I'm sure she'll be happy to have someone help out :)
great, I'll do, thanks

Regards,

Hervé

> 
> With regards,
> Daniel.
> 
> > Regards,
> > 
> > Hervé
> > 
> > [1] https://projects.apache.org/projects.html?pmc
> > 
> > Le mercredi 26 octobre 2016, 21:28:17 CEST Daniel Gruno a écrit :
> >> On 10/26/2016 09:06 PM, Mike Drob wrote:
> >>> A few section specific comments -
> >>> Largest/Busiest projects is difficult to make use of due to the huge
> >>> "other" section. Maybe a list makes more sense rather than a pie/circle
> >>> chart.Email, topics and email authors, past year -- more readable as a
> >>> line
> >>> chart and for a longer time span I think
> >> 
> >> Changing the email stats to lines was rather straightforward, so I've
> >> done that. I also changed it to just show stats for user/dev lists,
> >> leaving out the issues/commit lists which are rather chatty but not
> >> representative of email-based discussions. Changing the top repos by
> >> sloc/commits will require some time, as I'll have to write some custom
> >> representation for that.
> >> 
> >> With regards,
> >> Daniel.
> >> 
> >>> On Wed, Oct 26, 2016 at 1:07 PM, Daniel Gruno <hu...@apache.org>
> > 
> > wrote:
> >>>> I added an initial stats page at
> >>>> https://projects.apache.org/statistics.html - assuming no one objects,
> >>>> I'll add it to the top menu of the other pages in a day or so.
> >>>> 
> >>>> Do peruse - anything we need to add/edit?
> >>>> 
> >>>> With regards,
> >>>> Daniel.
> >>>> 
> >>>> On 10/26/2016 01:07 PM, Daniel Gruno wrote:
> >>>>> Hi folks,
> >>>>> I was wondering, since we have full access to Snoot for the ASF, why
> >>>>> not
> >>>>> take advantage of that and add a statistics page to
> >>>>> projects.apache.org,
> >>>>> showing the various live stats available (no. of commits/committers,
> >>>>> largest repos by size/commits, proper language breakdown, relationship
> >>>>> mapping, mail stats etc).
> >>>>> 
> >>>>> I was inclined to JFDI, but I'd love to hear what others think about
> >>>>> this. If I don't hear any loud objections, I'll add a stats page
> >>>>> today,
> >>>>> and we can see if it's of any use :)
> >>>>> 
> >>>>> Comments? Suggestions? :)
> >>>>> 
> >>>>> With regards,
> >>>>> Daniel.
> >>>>> 
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> >>>>> For additional commands, e-mail: dev-help@community.apache.org
> >>>> 
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> >>>> For additional commands, e-mail: dev-help@community.apache.org
> >> 
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> >> For additional commands, e-mail: dev-help@community.apache.org
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> > For additional commands, e-mail: dev-help@community.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Daniel Gruno <hu...@apache.org>.
On 10/28/2016 10:17 AM, Herv BOUTEMY wrote:
> IIUC, this "Largest/Busiest projects" statistics is neither per project, 
> neither per committee (or PMC), but per repo
> 
> notice: 1 committee (or PMC) = n projects [1]
> and 1 committee may have many repos
> 
> I'll update the title to "Largest/Busiest repos", that will be less 
> misleading.

Thanks!
> 
> 
> I have a snoot account, then I could have a look at the list of repos that are 
> taken into account. I have a few questions:
> 1. can we show the list of repos from this statistics page?

Do you mean the entire list of repos analysed? I'm sure we could, I just
don't quite know how we'd present it :) It's a rather large list.

> 2. I saw that some imports are failing, because list of repos change over 
> time: how can I help fix issues?

If you're up for keeping the list updated, speak to Sally about getting
admin privs on Snoot, I'm sure she'll be happy to have someone help out :)

With regards,
Daniel.

> 
> Regards,
> 
> Herv
> 
> [1] https://projects.apache.org/projects.html?pmc
> 
> Le mercredi 26 octobre 2016, 21:28:17 CEST Daniel Gruno a crit :
>> On 10/26/2016 09:06 PM, Mike Drob wrote:
>>> A few section specific comments -
>>> Largest/Busiest projects is difficult to make use of due to the huge
>>> "other" section. Maybe a list makes more sense rather than a pie/circle
>>> chart.Email, topics and email authors, past year -- more readable as a
>>> line
>>> chart and for a longer time span I think
>>
>> Changing the email stats to lines was rather straightforward, so I've
>> done that. I also changed it to just show stats for user/dev lists,
>> leaving out the issues/commit lists which are rather chatty but not
>> representative of email-based discussions. Changing the top repos by
>> sloc/commits will require some time, as I'll have to write some custom
>> representation for that.
>>
>> With regards,
>> Daniel.
>>
>>> On Wed, Oct 26, 2016 at 1:07 PM, Daniel Gruno <hu...@apache.org> 
> wrote:
>>>> I added an initial stats page at
>>>> https://projects.apache.org/statistics.html - assuming no one objects,
>>>> I'll add it to the top menu of the other pages in a day or so.
>>>>
>>>> Do peruse - anything we need to add/edit?
>>>>
>>>> With regards,
>>>> Daniel.
>>>>
>>>> On 10/26/2016 01:07 PM, Daniel Gruno wrote:
>>>>> Hi folks,
>>>>> I was wondering, since we have full access to Snoot for the ASF, why not
>>>>> take advantage of that and add a statistics page to projects.apache.org,
>>>>> showing the various live stats available (no. of commits/committers,
>>>>> largest repos by size/commits, proper language breakdown, relationship
>>>>> mapping, mail stats etc).
>>>>>
>>>>> I was inclined to JFDI, but I'd love to hear what others think about
>>>>> this. If I don't hear any loud objections, I'll add a stats page today,
>>>>> and we can see if it's of any use :)
>>>>>
>>>>> Comments? Suggestions? :)
>>>>>
>>>>> With regards,
>>>>> Daniel.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>>>>> For additional commands, e-mail: dev-help@community.apache.org
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>>>> For additional commands, e-mail: dev-help@community.apache.org
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>> For additional commands, e-mail: dev-help@community.apache.org
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Hervé BOUTEMY <he...@free.fr>.
IIUC, this "Largest/Busiest projects" statistics is neither per project, 
neither per committee (or PMC), but per repo

notice: 1 committee (or PMC) = n projects [1]
and 1 committee may have many repos

I'll update the title to "Largest/Busiest repos", that will be less 
misleading.


I have a snoot account, then I could have a look at the list of repos that are 
taken into account. I have a few questions:
1. can we show the list of repos from this statistics page?
2. I saw that some imports are failing, because list of repos change over 
time: how can I help fix issues?

Regards,

Hervé

[1] https://projects.apache.org/projects.html?pmc

Le mercredi 26 octobre 2016, 21:28:17 CEST Daniel Gruno a écrit :
> On 10/26/2016 09:06 PM, Mike Drob wrote:
> > A few section specific comments -
> > Largest/Busiest projects is difficult to make use of due to the huge
> > "other" section. Maybe a list makes more sense rather than a pie/circle
> > chart.Email, topics and email authors, past year -- more readable as a
> > line
> > chart and for a longer time span I think
> 
> Changing the email stats to lines was rather straightforward, so I've
> done that. I also changed it to just show stats for user/dev lists,
> leaving out the issues/commit lists which are rather chatty but not
> representative of email-based discussions. Changing the top repos by
> sloc/commits will require some time, as I'll have to write some custom
> representation for that.
> 
> With regards,
> Daniel.
> 
> > On Wed, Oct 26, 2016 at 1:07 PM, Daniel Gruno <hu...@apache.org> 
wrote:
> >> I added an initial stats page at
> >> https://projects.apache.org/statistics.html - assuming no one objects,
> >> I'll add it to the top menu of the other pages in a day or so.
> >> 
> >> Do peruse - anything we need to add/edit?
> >> 
> >> With regards,
> >> Daniel.
> >> 
> >> On 10/26/2016 01:07 PM, Daniel Gruno wrote:
> >>> Hi folks,
> >>> I was wondering, since we have full access to Snoot for the ASF, why not
> >>> take advantage of that and add a statistics page to projects.apache.org,
> >>> showing the various live stats available (no. of commits/committers,
> >>> largest repos by size/commits, proper language breakdown, relationship
> >>> mapping, mail stats etc).
> >>> 
> >>> I was inclined to JFDI, but I'd love to hear what others think about
> >>> this. If I don't hear any loud objections, I'll add a stats page today,
> >>> and we can see if it's of any use :)
> >>> 
> >>> Comments? Suggestions? :)
> >>> 
> >>> With regards,
> >>> Daniel.
> >>> 
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> >>> For additional commands, e-mail: dev-help@community.apache.org
> >> 
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> >> For additional commands, e-mail: dev-help@community.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Daniel Gruno <hu...@apache.org>.
On 10/26/2016 09:06 PM, Mike Drob wrote:
> A few section specific comments -
> Largest/Busiest projects is difficult to make use of due to the huge
> "other" section. Maybe a list makes more sense rather than a pie/circle
> chart.Email, topics and email authors, past year -- more readable as a line
> chart and for a longer time span I think

Changing the email stats to lines was rather straightforward, so I've
done that. I also changed it to just show stats for user/dev lists,
leaving out the issues/commit lists which are rather chatty but not
representative of email-based discussions. Changing the top repos by
sloc/commits will require some time, as I'll have to write some custom
representation for that.

With regards,
Daniel.

> 
> 
> On Wed, Oct 26, 2016 at 1:07 PM, Daniel Gruno <hu...@apache.org> wrote:
> 
>> I added an initial stats page at
>> https://projects.apache.org/statistics.html - assuming no one objects,
>> I'll add it to the top menu of the other pages in a day or so.
>>
>> Do peruse - anything we need to add/edit?
>>
>> With regards,
>> Daniel.
>>
>> On 10/26/2016 01:07 PM, Daniel Gruno wrote:
>>> Hi folks,
>>> I was wondering, since we have full access to Snoot for the ASF, why not
>>> take advantage of that and add a statistics page to projects.apache.org,
>>> showing the various live stats available (no. of commits/committers,
>>> largest repos by size/commits, proper language breakdown, relationship
>>> mapping, mail stats etc).
>>>
>>> I was inclined to JFDI, but I'd love to hear what others think about
>>> this. If I don't hear any loud objections, I'll add a stats page today,
>>> and we can see if it's of any use :)
>>>
>>> Comments? Suggestions? :)
>>>
>>> With regards,
>>> Daniel.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>>> For additional commands, e-mail: dev-help@community.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>> For additional commands, e-mail: dev-help@community.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Mike Drob <md...@apache.org>.
A few section specific comments -
Largest/Busiest projects is difficult to make use of due to the huge
"other" section. Maybe a list makes more sense rather than a pie/circle
chart.Email, topics and email authors, past year -- more readable as a line
chart and for a longer time span I think


On Wed, Oct 26, 2016 at 1:07 PM, Daniel Gruno <hu...@apache.org> wrote:

> I added an initial stats page at
> https://projects.apache.org/statistics.html - assuming no one objects,
> I'll add it to the top menu of the other pages in a day or so.
>
> Do peruse - anything we need to add/edit?
>
> With regards,
> Daniel.
>
> On 10/26/2016 01:07 PM, Daniel Gruno wrote:
> > Hi folks,
> > I was wondering, since we have full access to Snoot for the ASF, why not
> > take advantage of that and add a statistics page to projects.apache.org,
> > showing the various live stats available (no. of commits/committers,
> > largest repos by size/commits, proper language breakdown, relationship
> > mapping, mail stats etc).
> >
> > I was inclined to JFDI, but I'd love to hear what others think about
> > this. If I don't hear any loud objections, I'll add a stats page today,
> > and we can see if it's of any use :)
> >
> > Comments? Suggestions? :)
> >
> > With regards,
> > Daniel.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> > For additional commands, e-mail: dev-help@community.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
>
>

Re: Adding some statistics to projects.a.o?

Posted by Daniel Gruno <hu...@apache.org>.
On 10/26/2016 08:25 PM, Ross Gardler wrote:
> The problem I see is that it looks like we are flat lining. If course what is happening is there are growing projects, stable projects and shrinking projects, together it looks flat.
> 
> I find it misrepresentative as a result.

I actually find it very impressive that we have managed to keep up the
momentum :). As you rightly say, some projects are maturing, a few are
going away, some are growing and some are brand new with full steam ahead.

Can I assume your specific worry is about the "commits, past year"
chart? We can change that to either an all-time view to show the
contrast with earlier years, or scrap it, I have no issue with either
solution, though I also have no personal concern over keeping it as is.

With regards,
Daniel.

> 
> Ross
> 
> ---
> Twitter: @rgardler
> 
> ________________________________
> From: Daniel Gruno <hu...@apache.org>
> Sent: Wednesday, October 26, 2016 11:07:45 AM
> To: dev@community.apache.org
> Subject: Re: Adding some statistics to projects.a.o?
> 
> I added an initial stats page at
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprojects.apache.org%2Fstatistics.html&data=02%7C01%7CRoss.Gardler%40microsoft.com%7C18b1039a00eb4832058808d3fdcaffae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636131020722880473&sdata=WGgD5RxdxoR6wo2DYWkBaqsVKqyYQAmw44032Mp0Mpk%3D&reserved=0 - assuming no one objects,
> I'll add it to the top menu of the other pages in a day or so.
> 
> Do peruse - anything we need to add/edit?
> 
> With regards,
> Daniel.
> 
> On 10/26/2016 01:07 PM, Daniel Gruno wrote:
>> Hi folks,
>> I was wondering, since we have full access to Snoot for the ASF, why not
>> take advantage of that and add a statistics page to projects.apache.org,
>> showing the various live stats available (no. of commits/committers,
>> largest repos by size/commits, proper language breakdown, relationship
>> mapping, mail stats etc).
>>
>> I was inclined to JFDI, but I'd love to hear what others think about
>> this. If I don't hear any loud objections, I'll add a stats page today,
>> and we can see if it's of any use :)
>>
>> Comments? Suggestions? :)
>>
>> With regards,
>> Daniel.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>> For additional commands, e-mail: dev-help@community.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Ross Gardler <Ro...@microsoft.com>.
The problem I see is that it looks like we are flat lining. If course what is happening is there are growing projects, stable projects and shrinking projects, together it looks flat.

I find it misrepresentative as a result.

Ross

---
Twitter: @rgardler

________________________________
From: Daniel Gruno <hu...@apache.org>
Sent: Wednesday, October 26, 2016 11:07:45 AM
To: dev@community.apache.org
Subject: Re: Adding some statistics to projects.a.o?

I added an initial stats page at
https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fprojects.apache.org%2Fstatistics.html&data=02%7C01%7CRoss.Gardler%40microsoft.com%7C18b1039a00eb4832058808d3fdcaffae%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636131020722880473&sdata=WGgD5RxdxoR6wo2DYWkBaqsVKqyYQAmw44032Mp0Mpk%3D&reserved=0 - assuming no one objects,
I'll add it to the top menu of the other pages in a day or so.

Do peruse - anything we need to add/edit?

With regards,
Daniel.

On 10/26/2016 01:07 PM, Daniel Gruno wrote:
> Hi folks,
> I was wondering, since we have full access to Snoot for the ASF, why not
> take advantage of that and add a statistics page to projects.apache.org,
> showing the various live stats available (no. of commits/committers,
> largest repos by size/commits, proper language breakdown, relationship
> mapping, mail stats etc).
>
> I was inclined to JFDI, but I'd love to hear what others think about
> this. If I don't hear any loud objections, I'll add a stats page today,
> and we can see if it's of any use :)
>
> Comments? Suggestions? :)
>
> With regards,
> Daniel.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Hervé BOUTEMY <he...@free.fr>.
Le vendredi 28 octobre 2016, 10:41:37 CEST Daniel Gruno a écrit :
> > For Maven, the only option I see is pom.xml files: how can we confirm
> > this?
> > And confirm if language breakdowns counts files only, or weighted with
> > file size or with another weight?
> 
> It uses the same heuristics as CLoC (with a few modifications for
> increased stability), so you could run that locally and see why it does
> what it does.
ok, then here is the doc I'll add a pointer to
https://github.com/AlDanial/cloc#Languages

> language analysers are never 100% accurate, OpenHub's
> analyser is famous for making odd claims about Forth, and GitHub's
> downplays Python in many projects, the list goes on :)
> 
> Counts are lines of code, there is no weighting going on there. it's
> just raw figures.
> 
> > Is Snoot open sourced somewhere?
> 
> It's about as open as GitHub, OpenHub, Masterbranch etc :) The interface
> for the system is public and documented (under documentation/exports),
> but the internal systems on the boxes are proprietary for the most part.
great, all-sources is the API I needed to integrate content into projects.a.o 
build: we'll need to sort out which token to use, but at least I can start to 
work on interpreting results fetched by hand

https://api.snoot.io/api/3/api-docs#all-sources


> 
> With regards,
> Daniel.
> 
> > Regards,
> > 
> > Hervé
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> > For additional commands, e-mail: dev-help@community.apache.org
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Daniel Gruno <hu...@apache.org>.
On 10/28/2016 10:30 AM, Herv BOUTEMY wrote:
> Le mercredi 26 octobre 2016, 23:12:44 CEST Daniel Gruno a crit :
>> On 10/26/2016 10:56 PM, Phil Steitz wrote:
>>> On 10/26/16 11:07 AM, Daniel Gruno wrote:
>>>> I added an initial stats page at
>>>> https://projects.apache.org/statistics.html - assuming no one objects,
>>>> I'll add it to the top menu of the other pages in a day or so.
>>>>
>>>> Do peruse - anything we need to add/edit?
>>>
>>> Maven is not a programming language.  What exactly is the
>>> denominator on that stat?  Number of files?  Lines of code?
>>> Projects primarily using?
>>
>> I suspect it's scripts specifically for maven it's counting. the
>> denominator is lines of functional code (101 million in total, not
>> counting blanks and comments which take us to 150M total).
> For Maven, the only option I see is pom.xml files: how can we confirm this?
> And confirm if language breakdowns counts files only, or weighted with file size 
> or with another weight?

It uses the same heuristics as CLoC (with a few modifications for
increased stability), so you could run that locally and see why it does
what it does. language analysers are never 100% accurate, OpenHub's
analyser is famous for making odd claims about Forth, and GitHub's
downplays Python in many projects, the list goes on :)

Counts are lines of code, there is no weighting going on there. it's
just raw figures.

> Is Snoot open sourced somewhere?

It's about as open as GitHub, OpenHub, Masterbranch etc :) The interface
for the system is public and documented (under documentation/exports),
but the internal systems on the boxes are proprietary for the most part.

With regards,
Daniel.

> 
> Regards,
> 
> Herv
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Hervé BOUTEMY <he...@free.fr>.
Le mercredi 26 octobre 2016, 23:12:44 CEST Daniel Gruno a écrit :
> On 10/26/2016 10:56 PM, Phil Steitz wrote:
> > On 10/26/16 11:07 AM, Daniel Gruno wrote:
> >> I added an initial stats page at
> >> https://projects.apache.org/statistics.html - assuming no one objects,
> >> I'll add it to the top menu of the other pages in a day or so.
> >> 
> >> Do peruse - anything we need to add/edit?
> > 
> > Maven is not a programming language.  What exactly is the
> > denominator on that stat?  Number of files?  Lines of code?
> > Projects primarily using?
> 
> I suspect it's scripts specifically for maven it's counting. the
> denominator is lines of functional code (101 million in total, not
> counting blanks and comments which take us to 150M total).
For Maven, the only option I see is pom.xml files: how can we confirm this?
And confirm if language breakdowns counts files only, or weighted with file size 
or with another weight?
Is Snoot open sourced somewhere?

Regards,

Hervé

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Daniel Gruno <hu...@apache.org>.
On 10/27/2016 12:20 AM, sebb wrote:
> These explanations of the what the stats mean need to be provided on
> the page or linked from it.

Right, perhaps below/above each of them would be a good idea. I'll get
working on that tomorrow.

> 
> On 26 October 2016 at 22:12, Daniel Gruno <hu...@apache.org> wrote:
>> On 10/26/2016 10:56 PM, Phil Steitz wrote:
>>> On 10/26/16 11:07 AM, Daniel Gruno wrote:
>>>> I added an initial stats page at
>>>> https://projects.apache.org/statistics.html - assuming no one objects,
>>>> I'll add it to the top menu of the other pages in a day or so.
>>>>
>>>> Do peruse - anything we need to add/edit?
>>>
>>> Maven is not a programming language.  What exactly is the
>>> denominator on that stat?  Number of files?  Lines of code?
>>> Projects primarily using?
>>
>> I suspect it's scripts specifically for maven it's counting. the
>> denominator is lines of functional code (101 million in total, not
>> counting blanks and comments which take us to 150M total).
>>
>>>
>>> What does lines changed mean?  It looks like lines changed is
>>> somehow supposed to be insertions plus deletions.  Where are the
>>> mods to lines?  Is this just counting -- and ++ out of diffs?  That
>>> is a very bad metric on how much code has actually changed or what a
>>> contribution is.  Formatting nits, creating RCs, etc generate huge
>>> amounts of this stuff without really contributing much.
>>
>> AIUI, the huge ++/-- are weeded out in these charts, otherwise it would
>> be in the millions of lines of code changed some days. We have, on
>> average, 700-800 commits per business day to our repos, and with roughly
>> 100k additions according to the chart, that would indicate an average of
>> ~125 lines changed per commit. It's very possible that this includes
>> some automatic changes, I can't say. As they are somewhat static, I am
>> considering just scrapping that part, it probably doesn't show that much
>> of value to us.
>>
>>>
>>> What in the heck is an "author?"  We eliminated @author tags years
>>> ago because *we don't think like that* - lets not regress.  If it
>>> means someone created a new file, what is different about that than
>>> just committing a patch of some kind?  I would drop that metric or
>>> just merge it into committers.
>>
>> An author in this context is someone who authored a piece of code, a
>> committer is someone who committed the code to a repository. They need
>> not be the same person. In Subversion, they are the same, as svn does
>> not distinguish. In git, they are two different entities. Committers are
>> always ASF committers, authors can be any contributor to a project with
>> or without an apache account.
>>
>>>
>>> I very much do not like the "leader board" concept, especially with
>>> a bogus metric like number of diff lines generated driving it.  I
>>> would drop that thing.
>>
>> It's number of unique commits driving it, not number of diffs - that's a
>> secondary statistic. While we disagree on liking this, I'll definitely
>> take it under advisement as I work on the page. Note, it's not been made
>> public in the sense that the front page links to it just yet, I'll do
>> that once we are more aligned idea-wise.
>>
>>>
>>> I would rather see "busiest" or "most active" projects defined by
>>> something more meaningful like number of issues resolved or number
>>> of releases.   So change at least the first metric on the bottom to
>>> number of issues resolved and maybe make the second one number of
>>> releases.
>>
>> Number of releases would be nigh impossible, as we don't really keep
>> score of that, at all. Issues solved could be done easily, though we
>> don't have any formal mapping from issue tracker names back to our
>> projects, so it would probably show which JIRA/BZ instances are the most
>> active instead.
>>
>> With regards,
>> Daniel.
>>
>>>
>>> Phil
>>>
>>>
>>>
>>>
>>>>
>>>> With regards,
>>>> Daniel.
>>>>
>>>> On 10/26/2016 01:07 PM, Daniel Gruno wrote:
>>>>> Hi folks,
>>>>> I was wondering, since we have full access to Snoot for the ASF, why not
>>>>> take advantage of that and add a statistics page to projects.apache.org,
>>>>> showing the various live stats available (no. of commits/committers,
>>>>> largest repos by size/commits, proper language breakdown, relationship
>>>>> mapping, mail stats etc).
>>>>>
>>>>> I was inclined to JFDI, but I'd love to hear what others think about
>>>>> this. If I don't hear any loud objections, I'll add a stats page today,
>>>>> and we can see if it's of any use :)
>>>>>
>>>>> Comments? Suggestions? :)
>>>>>
>>>>> With regards,
>>>>> Daniel.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>>>>> For additional commands, e-mail: dev-help@community.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>>>> For additional commands, e-mail: dev-help@community.apache.org
>>>>
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>>> For additional commands, e-mail: dev-help@community.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>> For additional commands, e-mail: dev-help@community.apache.org
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by sebb <se...@gmail.com>.
These explanations of the what the stats mean need to be provided on
the page or linked from it.

On 26 October 2016 at 22:12, Daniel Gruno <hu...@apache.org> wrote:
> On 10/26/2016 10:56 PM, Phil Steitz wrote:
>> On 10/26/16 11:07 AM, Daniel Gruno wrote:
>>> I added an initial stats page at
>>> https://projects.apache.org/statistics.html - assuming no one objects,
>>> I'll add it to the top menu of the other pages in a day or so.
>>>
>>> Do peruse - anything we need to add/edit?
>>
>> Maven is not a programming language.  What exactly is the
>> denominator on that stat?  Number of files?  Lines of code?
>> Projects primarily using?
>
> I suspect it's scripts specifically for maven it's counting. the
> denominator is lines of functional code (101 million in total, not
> counting blanks and comments which take us to 150M total).
>
>>
>> What does lines changed mean?  It looks like lines changed is
>> somehow supposed to be insertions plus deletions.  Where are the
>> mods to lines?  Is this just counting -- and ++ out of diffs?  That
>> is a very bad metric on how much code has actually changed or what a
>> contribution is.  Formatting nits, creating RCs, etc generate huge
>> amounts of this stuff without really contributing much.
>
> AIUI, the huge ++/-- are weeded out in these charts, otherwise it would
> be in the millions of lines of code changed some days. We have, on
> average, 700-800 commits per business day to our repos, and with roughly
> 100k additions according to the chart, that would indicate an average of
> ~125 lines changed per commit. It's very possible that this includes
> some automatic changes, I can't say. As they are somewhat static, I am
> considering just scrapping that part, it probably doesn't show that much
> of value to us.
>
>>
>> What in the heck is an "author?"  We eliminated @author tags years
>> ago because *we don't think like that* - lets not regress.  If it
>> means someone created a new file, what is different about that than
>> just committing a patch of some kind?  I would drop that metric or
>> just merge it into committers.
>
> An author in this context is someone who authored a piece of code, a
> committer is someone who committed the code to a repository. They need
> not be the same person. In Subversion, they are the same, as svn does
> not distinguish. In git, they are two different entities. Committers are
> always ASF committers, authors can be any contributor to a project with
> or without an apache account.
>
>>
>> I very much do not like the "leader board" concept, especially with
>> a bogus metric like number of diff lines generated driving it.  I
>> would drop that thing.
>
> It's number of unique commits driving it, not number of diffs - that's a
> secondary statistic. While we disagree on liking this, I'll definitely
> take it under advisement as I work on the page. Note, it's not been made
> public in the sense that the front page links to it just yet, I'll do
> that once we are more aligned idea-wise.
>
>>
>> I would rather see "busiest" or "most active" projects defined by
>> something more meaningful like number of issues resolved or number
>> of releases.   So change at least the first metric on the bottom to
>> number of issues resolved and maybe make the second one number of
>> releases.
>
> Number of releases would be nigh impossible, as we don't really keep
> score of that, at all. Issues solved could be done easily, though we
> don't have any formal mapping from issue tracker names back to our
> projects, so it would probably show which JIRA/BZ instances are the most
> active instead.
>
> With regards,
> Daniel.
>
>>
>> Phil
>>
>>
>>
>>
>>>
>>> With regards,
>>> Daniel.
>>>
>>> On 10/26/2016 01:07 PM, Daniel Gruno wrote:
>>>> Hi folks,
>>>> I was wondering, since we have full access to Snoot for the ASF, why not
>>>> take advantage of that and add a statistics page to projects.apache.org,
>>>> showing the various live stats available (no. of commits/committers,
>>>> largest repos by size/commits, proper language breakdown, relationship
>>>> mapping, mail stats etc).
>>>>
>>>> I was inclined to JFDI, but I'd love to hear what others think about
>>>> this. If I don't hear any loud objections, I'll add a stats page today,
>>>> and we can see if it's of any use :)
>>>>
>>>> Comments? Suggestions? :)
>>>>
>>>> With regards,
>>>> Daniel.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>>>> For additional commands, e-mail: dev-help@community.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>>> For additional commands, e-mail: dev-help@community.apache.org
>>>
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>> For additional commands, e-mail: dev-help@community.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Daniel Gruno <hu...@apache.org>.
On 10/26/2016 10:56 PM, Phil Steitz wrote:
> On 10/26/16 11:07 AM, Daniel Gruno wrote:
>> I added an initial stats page at
>> https://projects.apache.org/statistics.html - assuming no one objects,
>> I'll add it to the top menu of the other pages in a day or so.
>>
>> Do peruse - anything we need to add/edit?
> 
> Maven is not a programming language.  What exactly is the
> denominator on that stat?  Number of files?  Lines of code? 
> Projects primarily using?

I suspect it's scripts specifically for maven it's counting. the
denominator is lines of functional code (101 million in total, not
counting blanks and comments which take us to 150M total).

> 
> What does lines changed mean?  It looks like lines changed is
> somehow supposed to be insertions plus deletions.  Where are the
> mods to lines?  Is this just counting -- and ++ out of diffs?  That
> is a very bad metric on how much code has actually changed or what a
> contribution is.  Formatting nits, creating RCs, etc generate huge
> amounts of this stuff without really contributing much.

AIUI, the huge ++/-- are weeded out in these charts, otherwise it would
be in the millions of lines of code changed some days. We have, on
average, 700-800 commits per business day to our repos, and with roughly
100k additions according to the chart, that would indicate an average of
~125 lines changed per commit. It's very possible that this includes
some automatic changes, I can't say. As they are somewhat static, I am
considering just scrapping that part, it probably doesn't show that much
of value to us.

> 
> What in the heck is an "author?"  We eliminated @author tags years
> ago because *we don't think like that* - lets not regress.  If it
> means someone created a new file, what is different about that than
> just committing a patch of some kind?  I would drop that metric or
> just merge it into committers.

An author in this context is someone who authored a piece of code, a
committer is someone who committed the code to a repository. They need
not be the same person. In Subversion, they are the same, as svn does
not distinguish. In git, they are two different entities. Committers are
always ASF committers, authors can be any contributor to a project with
or without an apache account.

> 
> I very much do not like the "leader board" concept, especially with
> a bogus metric like number of diff lines generated driving it.  I
> would drop that thing.

It's number of unique commits driving it, not number of diffs - that's a
secondary statistic. While we disagree on liking this, I'll definitely
take it under advisement as I work on the page. Note, it's not been made
public in the sense that the front page links to it just yet, I'll do
that once we are more aligned idea-wise.

> 
> I would rather see "busiest" or "most active" projects defined by
> something more meaningful like number of issues resolved or number
> of releases.   So change at least the first metric on the bottom to
> number of issues resolved and maybe make the second one number of
> releases.

Number of releases would be nigh impossible, as we don't really keep
score of that, at all. Issues solved could be done easily, though we
don't have any formal mapping from issue tracker names back to our
projects, so it would probably show which JIRA/BZ instances are the most
active instead.

With regards,
Daniel.

> 
> Phil
> 
> 
> 
> 
>>
>> With regards,
>> Daniel.
>>
>> On 10/26/2016 01:07 PM, Daniel Gruno wrote:
>>> Hi folks,
>>> I was wondering, since we have full access to Snoot for the ASF, why not
>>> take advantage of that and add a statistics page to projects.apache.org,
>>> showing the various live stats available (no. of commits/committers,
>>> largest repos by size/commits, proper language breakdown, relationship
>>> mapping, mail stats etc).
>>>
>>> I was inclined to JFDI, but I'd love to hear what others think about
>>> this. If I don't hear any loud objections, I'll add a stats page today,
>>> and we can see if it's of any use :)
>>>
>>> Comments? Suggestions? :)
>>>
>>> With regards,
>>> Daniel.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>>> For additional commands, e-mail: dev-help@community.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>> For additional commands, e-mail: dev-help@community.apache.org
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Phil Steitz <ph...@gmail.com>.
On 10/26/16 11:07 AM, Daniel Gruno wrote:
> I added an initial stats page at
> https://projects.apache.org/statistics.html - assuming no one objects,
> I'll add it to the top menu of the other pages in a day or so.
>
> Do peruse - anything we need to add/edit?

Maven is not a programming language.  What exactly is the
denominator on that stat?  Number of files?  Lines of code? 
Projects primarily using?

What does lines changed mean?  It looks like lines changed is
somehow supposed to be insertions plus deletions.  Where are the
mods to lines?  Is this just counting -- and ++ out of diffs?  That
is a very bad metric on how much code has actually changed or what a
contribution is.  Formatting nits, creating RCs, etc generate huge
amounts of this stuff without really contributing much.

What in the heck is an "author?"  We eliminated @author tags years
ago because *we don't think like that* - lets not regress.  If it
means someone created a new file, what is different about that than
just committing a patch of some kind?  I would drop that metric or
just merge it into committers.

I very much do not like the "leader board" concept, especially with
a bogus metric like number of diff lines generated driving it.  I
would drop that thing.

I would rather see "busiest" or "most active" projects defined by
something more meaningful like number of issues resolved or number
of releases.   So change at least the first metric on the bottom to
number of issues resolved and maybe make the second one number of
releases.

Phil




>
> With regards,
> Daniel.
>
> On 10/26/2016 01:07 PM, Daniel Gruno wrote:
>> Hi folks,
>> I was wondering, since we have full access to Snoot for the ASF, why not
>> take advantage of that and add a statistics page to projects.apache.org,
>> showing the various live stats available (no. of commits/committers,
>> largest repos by size/commits, proper language breakdown, relationship
>> mapping, mail stats etc).
>>
>> I was inclined to JFDI, but I'd love to hear what others think about
>> this. If I don't hear any loud objections, I'll add a stats page today,
>> and we can see if it's of any use :)
>>
>> Comments? Suggestions? :)
>>
>> With regards,
>> Daniel.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
>> For additional commands, e-mail: dev-help@community.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by Daniel Gruno <hu...@apache.org>.
I added an initial stats page at
https://projects.apache.org/statistics.html - assuming no one objects,
I'll add it to the top menu of the other pages in a day or so.

Do peruse - anything we need to add/edit?

With regards,
Daniel.

On 10/26/2016 01:07 PM, Daniel Gruno wrote:
> Hi folks,
> I was wondering, since we have full access to Snoot for the ASF, why not
> take advantage of that and add a statistics page to projects.apache.org,
> showing the various live stats available (no. of commits/committers,
> largest repos by size/commits, proper language breakdown, relationship
> mapping, mail stats etc).
> 
> I was inclined to JFDI, but I'd love to hear what others think about
> this. If I don't hear any loud objections, I'll add a stats page today,
> and we can see if it's of any use :)
> 
> Comments? Suggestions? :)
> 
> With regards,
> Daniel.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org


Re: Adding some statistics to projects.a.o?

Posted by muktesh mishra <mu...@hotmail.com>.
Seems to be a good idea to me.

-Muktesh

Sent from my iPhone

> On Oct 26, 2016, at 4:07 AM, Daniel Gruno <hu...@apache.org> wrote:
> 
> Hi folks,
> I was wondering, since we have full access to Snoot for the ASF, why not
> take advantage of that and add a statistics page to projects.apache.org,
> showing the various live stats available (no. of commits/committers,
> largest repos by size/commits, proper language breakdown, relationship
> mapping, mail stats etc).
> 
> I was inclined to JFDI, but I'd love to hear what others think about
> this. If I don't hear any loud objections, I'll add a stats page today,
> and we can see if it's of any use :)
> 
> Comments? Suggestions? :)
> 
> With regards,
> Daniel.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
> For additional commands, e-mail: dev-help@community.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@community.apache.org
For additional commands, e-mail: dev-help@community.apache.org