You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@gump.apache.org by Stefan Bodewig <bo...@apache.org> on 2004/02/05 15:22:08 UTC

Re: Nagging the cause vs. Nagging the effect

On Thu, 5 Feb 2004, Stefano Mazzocchi <st...@apache.org> wrote:
> On 5 Feb 2004, at 02:17, Adam Jack wrote:
> 
>> I've long wished that Gump could nag the 'cause' of a problem, not
>> the 'effect', but it is (AFAICT) pretty much impossible to guess
>> who is cause from a compile failure.
> 
> Tell you what: there have been looooong discussing about this and
> endless hours that I spent on the whiteboard trying to figure out
> *where* that data can emerge out of the entire mass of data that
> gump is either collecting or generating.
> 
> I was still not able to find it, still not able to come up with a
> general algorithm that would, at least, if not identify the cause,
> at least discriminate between "causing trends" and "effected
> trends".

The way you'd do it manually is about as follows, I believe:

* start with the last good build

* replace one of the dependencies with its latest version at a time
  and rebuild - reiterate until the build fails.

Gumpy should have enough data to do that, but the whole thing breaks
down when Gump has been unable to build a project for weeks or even
months as the number of changes becomes too big.

> I think the key is that the gump runs as for Gump or Gumpy do *not*
> contain enough information. But if we had both:
> 
>   1) the latest dependency run
>   2) the stable dependency run
> 
> and we had enough history of these (say a few months), I'm pretty
> sure the data *IS* there.

I think Gumpy already collects, or at least could collect, historical
data for all dependency runs.  If the dependency run has been
successful at least once, the data is supposed to be there.

I realize that this is naive. 8-)

Stefan

Re: Nagging the cause vs. Nagging the effect

Posted by Stefan Bodewig <bo...@apache.org>.

On Thu, 5 Feb 2004, Stefano Mazzocchi <st...@apache.org> wrote:
> On 5 Feb 2004, at 09:22, Stefan Bodewig wrote:

>> I think Gumpy already collects, or at least could collect,
>> historical data for all dependency runs.
> 
> how? where?

<http://lsd.student.utwente.nl/gump/gump_stats/index.html>

in particular

<http://lsd.student.utwente.nl/gump/gump_stats/project_fogfactor.html>

which contains the number of failures, pre-req failures and successful
builds.  Currently it looks as if just the aggregate numbers are
preserved, but this is a start.

Stefan

Re: Nagging the cause vs. Nagging the effect

Posted by Stefano Mazzocchi <st...@apache.org>.

On 5 Feb 2004, at 09:22, Stefan Bodewig wrote:

> On Thu, 5 Feb 2004, Stefano Mazzocchi <st...@apache.org> wrote:
>> On 5 Feb 2004, at 02:17, Adam Jack wrote:
>>
>>> I've long wished that Gump could nag the 'cause' of a problem, not
>>> the 'effect', but it is (AFAICT) pretty much impossible to guess
>>> who is cause from a compile failure.
>>
>> Tell you what: there have been looooong discussing about this and
>> endless hours that I spent on the whiteboard trying to figure out
>> *where* that data can emerge out of the entire mass of data that
>> gump is either collecting or generating.
>>
>> I was still not able to find it, still not able to come up with a
>> general algorithm that would, at least, if not identify the cause,
>> at least discriminate between "causing trends" and "effected
>> trends".
>
> The way you'd do it manually is about as follows, I believe:
>
> * start with the last good build
>
> * replace one of the dependencies with its latest version at a time
>   and rebuild - reiterate until the build fails.
>
> Gumpy should have enough data to do that, but the whole thing breaks
> down when Gump has been unable to build a project for weeks or even
> months as the number of changes becomes too big.

It is clear to me that we should develop a system that works "a 
regime", the bootstrap process (getting enough critical mass to attract 
attention) is way to complex to be automated.

But my gut feeling is that with a system that is running and has a 
reasonable nag/FoG metric/heuristic, the amount of changes never become 
too big because we stop them right at the beginning.

That's the beauty of continuous integration: it's a problem forecaster, 
it builds your project in a much wider scope that you can't simply take 
care by yourself..

[in this sense, it's like what google does when lists backlinks to a 
page: that information is not available to you, page author, because 
it's a property of the graph, not of the node]

>> I think the key is that the gump runs as for Gump or Gumpy do *not*
>> contain enough information. But if we had both:
>>
>>   1) the latest dependency run
>>   2) the stable dependency run
>>
>> and we had enough history of these (say a few months), I'm pretty
>> sure the data *IS* there.
>
> I think Gumpy already collects, or at least could collect, historical
> data for all dependency runs.

how? where?

> If the dependency run has been
> successful at least once, the data is supposed to be there.
>
> I realize that this is naive. 8-)

Oh, believe me, nothing is naive in regard to data emergence. Even the 
slightest local trend that looks silly and obvious, could exhibit 
massively great potentials if applied to a complex topology (much like 
google does for hyperlinks, agora does for replies, amazon does for 
shopping carts, for example)

--
Stefano.