You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kibble.apache.org by Robert Munteanu <ro...@apache.org> on 2018/09/05 18:38:28 UTC

Possible to exclude directories from analysis?

Hi,

I'm using the demo Kibble instance to visualise code contributions for
the Apache Sling project. One thing I noticed is that Kibble things
we're 75% HTML, which is not right - we're a Java project.

I think it's due to the fact that we use gitpubsub and have registered
our github.com/apache/sling-site repository with kibble. That
repository's master branch holds all the HTML we publish, including
lots of Javadocs, Maven plug-in documentation, etc.

Is it possible to exclude a certain directory from analysis, to make
the statistics more relevant?

Thanks,

Robert


Re: Possible to exclude directories from analysis?

Posted by Robert Munteanu <ro...@apache.org>.
On Wed, 2018-09-12 at 14:37 +0200, Daniel Gruno wrote:
> top posting, yaay!
> I have a new server sort of set up now. I'll have infra redirect to
> that 
> one instead, and it'll rebuild most of the database during the night 
> (and the next night, and...).
> 
> there's a new option, a path filter in the repos tab, which filters 
> commits, line changes, trends, top contributors etc by paths
> affected, 
> so you can enter either 'jbake' to get everything touching jbake, or 
> '!jbake' to get everything that doesn't touch those files.
> 
> The CNAME should switch over some time today :)


Cool, thanks! I'll give this a shot later.

Robert

> 
> With regards,
> Daniel.
> 
> PS: We're also switching to elasticsearch 6 with this move, which is 
> going to be great, as that allows us to test on a modern ES, instead
> of 
> the old 5.x installation we're currently running on.
> 
> On 09/12/2018 12:24 PM, Daniel Gruno wrote:
> > On 09/12/2018 12:22 PM, Robert Munteanu wrote:
> > > 
> > > If you look at the sling-site repository at [1] we have the
> > > actual
> > > documentation under src/main/jbake, with
> > > - content being markdown files
> > > - templates being ... well ... templates
> > > - and assets being static files
> > > 
> > > Some of those static files are generated (javadoc, Maven plugin
> > > sites)
> > > and should not be recorded by Kiddle. Those are the ones that are
> > > problematic, especially since we have a large number of javadocs
> > > committed.
> > > 
> > > $ find src/main/jbake/assets/apidocs -type f | wc -l
> > > 7127
> > > 
> > > Those are the ones we'd like excluded, if at all possible.
> > > 
> > > Thanks,
> > > 
> > > Robert
> > > 
> > > [1]: 
> > > https://github.com/apache/sling-site/tree/master/src/main/jbake
> > > 
> > > 
> > 
> > I've added a change to the scanners, so they will put a list of
> > files 
> > changes into each commit object we record. This is likely going to 
> > require a complete re-scan of all things sling...which in theory
> > is 
> > fine, as I _was_ planning on moving the demo server to a new box
> > anyway. 
> > I'll let y'all know more when I have that worked out in my mind :)
> > After the move and re-scan, it should be possible to exclude by
> > file paths.
> 
> 



Re: Possible to exclude directories from analysis?

Posted by Daniel Gruno <hu...@apache.org>.
top posting, yaay!
I have a new server sort of set up now. I'll have infra redirect to that 
one instead, and it'll rebuild most of the database during the night 
(and the next night, and...).

there's a new option, a path filter in the repos tab, which filters 
commits, line changes, trends, top contributors etc by paths affected, 
so you can enter either 'jbake' to get everything touching jbake, or 
'!jbake' to get everything that doesn't touch those files.

The CNAME should switch over some time today :)

With regards,
Daniel.

PS: We're also switching to elasticsearch 6 with this move, which is 
going to be great, as that allows us to test on a modern ES, instead of 
the old 5.x installation we're currently running on.

On 09/12/2018 12:24 PM, Daniel Gruno wrote:
> On 09/12/2018 12:22 PM, Robert Munteanu wrote:
>>
>> If you look at the sling-site repository at [1] we have the actual
>> documentation under src/main/jbake, with
>> - content being markdown files
>> - templates being ... well ... templates
>> - and assets being static files
>>
>> Some of those static files are generated (javadoc, Maven plugin sites)
>> and should not be recorded by Kiddle. Those are the ones that are
>> problematic, especially since we have a large number of javadocs
>> committed.
>>
>> $ find src/main/jbake/assets/apidocs -type f | wc -l
>> 7127
>>
>> Those are the ones we'd like excluded, if at all possible.
>>
>> Thanks,
>>
>> Robert
>>
>> [1]: https://github.com/apache/sling-site/tree/master/src/main/jbake
>>
>>
> 
> I've added a change to the scanners, so they will put a list of files 
> changes into each commit object we record. This is likely going to 
> require a complete re-scan of all things sling...which in theory is 
> fine, as I _was_ planning on moving the demo server to a new box anyway. 
> I'll let y'all know more when I have that worked out in my mind :)
> After the move and re-scan, it should be possible to exclude by file paths.


Re: Possible to exclude directories from analysis?

Posted by Daniel Gruno <hu...@apache.org>.
On 09/12/2018 12:22 PM, Robert Munteanu wrote:
> 
> If you look at the sling-site repository at [1] we have the actual
> documentation under src/main/jbake, with
> - content being markdown files
> - templates being ... well ... templates
> - and assets being static files
> 
> Some of those static files are generated (javadoc, Maven plugin sites)
> and should not be recorded by Kiddle. Those are the ones that are
> problematic, especially since we have a large number of javadocs
> committed.
> 
> $ find src/main/jbake/assets/apidocs -type f | wc -l
> 7127
> 
> Those are the ones we'd like excluded, if at all possible.
> 
> Thanks,
> 
> Robert
> 
> [1]: https://github.com/apache/sling-site/tree/master/src/main/jbake
> 
> 

I've added a change to the scanners, so they will put a list of files 
changes into each commit object we record. This is likely going to 
require a complete re-scan of all things sling...which in theory is 
fine, as I _was_ planning on moving the demo server to a new box anyway. 
I'll let y'all know more when I have that worked out in my mind :)
After the move and re-scan, it should be possible to exclude by file paths.

Re: Possible to exclude directories from analysis?

Posted by Robert Munteanu <ro...@apache.org>.
On Wed, 2018-09-12 at 12:05 +0200, Daniel Gruno wrote:
> On 09/12/2018 12:00 PM, Robert Munteanu wrote:
> > On Sat, 2018-09-08 at 12:54 +0200, Daniel Gruno wrote:
> > > On 09/05/2018 08:38 PM, Robert Munteanu wrote:
> > > > Hi,
> > > > 
> > > > I'm using the demo Kibble instance to visualise code
> > > > contributions
> > > > for
> > > > the Apache Sling project. One thing I noticed is that Kibble
> > > > things
> > > > we're 75% HTML, which is not right - we're a Java project.
> > > > 
> > > > I think it's due to the fact that we use gitpubsub and have
> > > > registered
> > > > our github.com/apache/sling-site repository with kibble. That
> > > > repository's master branch holds all the HTML we publish,
> > > > including
> > > > lots of Javadocs, Maven plug-in documentation, etc.
> > > 
> > > The easiest path would be to simply exclude the sling-site
> > > repository
> > > in
> > > your reports. If you're using a quick filter, instead of
> > > filtering
> > > on
> > > 'sling', you could do a negative lookahead and filter on
> > > 'sling(?!-site)' as the quick filter accepts regular expressions.
> > 
> > Thanks for the suggestestion. I ended up excluding the sling-site
> > repository completely from the 'Apache Sling' view. It's not ideal
> > as
> > it does not capture documentation contributions, which are quite
> > important as well.
> > 
> > It would be great if in the future we would have a more fine-
> > grained
> > solution.
> 
> Ideal solutions are rare :)
> Could you elaborate on exactly *what* you want to see, and what you
> want 
> to filter away? Some things may be possible, but when you have to do 
> aggregations on something like 3 million commits in real-time, it
> gets 
> tricky to exclude paths and individual files without throwing a huge
> lag 
> spike into the mix.

If you look at the sling-site repository at [1] we have the actual
documentation under src/main/jbake, with 
- content being markdown files
- templates being ... well ... templates
- and assets being static files

Some of those static files are generated (javadoc, Maven plugin sites)
and should not be recorded by Kiddle. Those are the ones that are
problematic, especially since we have a large number of javadocs
committed.

$ find src/main/jbake/assets/apidocs -type f | wc -l
7127

Those are the ones we'd like excluded, if at all possible.

Thanks,

Robert

[1]: https://github.com/apache/sling-site/tree/master/src/main/jbake



Re: Possible to exclude directories from analysis?

Posted by Daniel Gruno <hu...@apache.org>.
On 09/12/2018 12:00 PM, Robert Munteanu wrote:
> On Sat, 2018-09-08 at 12:54 +0200, Daniel Gruno wrote:
>> On 09/05/2018 08:38 PM, Robert Munteanu wrote:
>>> Hi,
>>>
>>> I'm using the demo Kibble instance to visualise code contributions
>>> for
>>> the Apache Sling project. One thing I noticed is that Kibble things
>>> we're 75% HTML, which is not right - we're a Java project.
>>>
>>> I think it's due to the fact that we use gitpubsub and have
>>> registered
>>> our github.com/apache/sling-site repository with kibble. That
>>> repository's master branch holds all the HTML we publish, including
>>> lots of Javadocs, Maven plug-in documentation, etc.
>>
>> The easiest path would be to simply exclude the sling-site repository
>> in
>> your reports. If you're using a quick filter, instead of filtering
>> on
>> 'sling', you could do a negative lookahead and filter on
>> 'sling(?!-site)' as the quick filter accepts regular expressions.
> 
> Thanks for the suggestestion. I ended up excluding the sling-site
> repository completely from the 'Apache Sling' view. It's not ideal as
> it does not capture documentation contributions, which are quite
> important as well.
> 
> It would be great if in the future we would have a more fine-grained
> solution.

Ideal solutions are rare :)
Could you elaborate on exactly *what* you want to see, and what you want 
to filter away? Some things may be possible, but when you have to do 
aggregations on something like 3 million commits in real-time, it gets 
tricky to exclude paths and individual files without throwing a huge lag 
spike into the mix.

> 
> Thanks,
> 
> Robert
> 


Re: Possible to exclude directories from analysis?

Posted by Robert Munteanu <ro...@apache.org>.
On Sat, 2018-09-08 at 12:54 +0200, Daniel Gruno wrote:
> On 09/05/2018 08:38 PM, Robert Munteanu wrote:
> > Hi,
> > 
> > I'm using the demo Kibble instance to visualise code contributions
> > for
> > the Apache Sling project. One thing I noticed is that Kibble things
> > we're 75% HTML, which is not right - we're a Java project.
> > 
> > I think it's due to the fact that we use gitpubsub and have
> > registered
> > our github.com/apache/sling-site repository with kibble. That
> > repository's master branch holds all the HTML we publish, including
> > lots of Javadocs, Maven plug-in documentation, etc.
> 
> The easiest path would be to simply exclude the sling-site repository
> in 
> your reports. If you're using a quick filter, instead of filtering
> on 
> 'sling', you could do a negative lookahead and filter on 
> 'sling(?!-site)' as the quick filter accepts regular expressions.

Thanks for the suggestestion. I ended up excluding the sling-site
repository completely from the 'Apache Sling' view. It's not ideal as
it does not capture documentation contributions, which are quite
important as well.

It would be great if in the future we would have a more fine-grained
solution.

Thanks,

Robert


Re: Possible to exclude directories from analysis?

Posted by Daniel Gruno <hu...@apache.org>.
On 09/05/2018 08:38 PM, Robert Munteanu wrote:
> Hi,
> 
> I'm using the demo Kibble instance to visualise code contributions for
> the Apache Sling project. One thing I noticed is that Kibble things
> we're 75% HTML, which is not right - we're a Java project.
> 
> I think it's due to the fact that we use gitpubsub and have registered
> our github.com/apache/sling-site repository with kibble. That
> repository's master branch holds all the HTML we publish, including
> lots of Javadocs, Maven plug-in documentation, etc.

The easiest path would be to simply exclude the sling-site repository in 
your reports. If you're using a quick filter, instead of filtering on 
'sling', you could do a negative lookahead and filter on 
'sling(?!-site)' as the quick filter accepts regular expressions.

With regards,
Daniel.

> 
> Is it possible to exclude a certain directory from analysis, to make
> the statistics more relevant?
> 
> Thanks,
> 
> Robert
>