You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@gump.apache.org by Stefano Mazzocchi <st...@apache.org> on 2004/12/09 01:32:34 UTC

[RT] Gump 3.0 - Database Model

Since I received no pushback on my proposal, let's move on discussing 
the database model.

I think the first step is to identify the entities that we want to 
model, their relationships and their respective cardinality.

Here is what Leo and I came up with so far (attached as PDF).

Comments/criticism/questions appreciated.

-- 
Stefano.

Re: [RT] Gump 3.0 - Database Model

Posted by Stefano Mazzocchi <st...@apache.org>.

Niclas Hedhman wrote:
> On Monday 13 December 2004 09:09, Stefano Mazzocchi wrote:
> 
> 
>>Eric, I really don't care what ID we choose, as long as it does identify
>>something univocally also in a global and distributed environment.
> 
> 
> RDF ?
> Isn't RDF a perfect fit for this kind of problems ?

eheh, no URIs are :-)

Let's start with that first and we'll go a long way already.

-- 
Stefano.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by Niclas Hedhman <ni...@hedhman.org>.

On Monday 13 December 2004 09:09, Stefano Mazzocchi wrote:

> Eric, I really don't care what ID we choose, as long as it does identify
> something univocally also in a global and distributed environment.

RDF ?
Isn't RDF a perfect fit for this kind of problems ?

Niclas
-- 
   +------//-------------------+
  / http://www.dpml.net       /
 / http://niclas.hedhman.org / 
+------//-------------------+


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by Stefano Mazzocchi <st...@apache.org>.

Eric Pugh wrote:
> Just catching up on my email after being gone for a week.  One thing that
> strikes me about the project id's is that this seems to continue the same
> discussion we have had in the past about maven generated project id's versus
> the gump project id's...
> 
> Do the project id's have to have meaning?  While it's nice to look at a
> project id and pick out some data, like the version and the timestamp or
> what not, eventually gump will run into another project where the id's mean
> something different and are generated differently.  I don't mind a project
> id like "787234" that I then look up and find out is what ever specific
> meaning it has.  Like version, or host, or whatnot.   I think that when we
> establish project naming conventions we'll run into conflicts with how other
> projects name themselves....
> 
> 
>>I would welcome project IDs of the form
>>
>>  http://www.apache.org/projects/cocoon
>>
>>and then
>>
>>  http://www.apache.org/projects/cocoon#v1.0
>>
>>for a particular released version, or
>>
>>  http://www.apache.org/projects/cocoon#20041210
>>

Eric, I really don't care what ID we choose, as long as it does identify 
something univocally also in a global and distributed environment.

-- 
Stefano.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

RE: [RT] Gump 3.0 - Database Model

Posted by Eric Pugh <ep...@opensourceconnections.com>.

Just catching up on my email after being gone for a week.  One thing that
strikes me about the project id's is that this seems to continue the same
discussion we have had in the past about maven generated project id's versus
the gump project id's...

Do the project id's have to have meaning?  While it's nice to look at a
project id and pick out some data, like the version and the timestamp or
what not, eventually gump will run into another project where the id's mean
something different and are generated differently.  I don't mind a project
id like "787234" that I then look up and find out is what ever specific
meaning it has.  Like version, or host, or whatnot.   I think that when we
establish project naming conventions we'll run into conflicts with how other
projects name themselves....

> I would welcome project IDs of the form
>
>   http://www.apache.org/projects/cocoon
>
> and then
>
>   http://www.apache.org/projects/cocoon#v1.0
>
> for a particular released version, or
>
>   http://www.apache.org/projects/cocoon#20041210
>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by Stefano Mazzocchi <st...@apache.org>.

Adam R. B. Jack wrote:
>>Since I received no pushback on my proposal, let's move on discussing
>>the database model.
> 
> 
> I see this model is good enough for certain aspects of the proposed 3.0, but
> not for all. We can't store the metadata in it, in order to perform builds
> from, there is clearly insufficient information. 

Correct. That model comes from my work on Dynagump and it's a contract 
between the stage 2 (build) and stage 3 (presentation).

> That said, I am more than
> happy to start on a 3.0 break-up by splitting the outputs from the
> presentation of those outputs via this model.

Great. It would also be useful to have information on what kind of 
information you think you need (repositories come to mind... what else?)

> That said, I still need more information on the contents of ids (and such),
> to verify the model is correct. Here are some initial reactions:
> 
> One thing I noticed you mentioned was a desire for this database model to
> allow Gump to be distributed. 

Correct. This is critical when we start building native code, since we 
can't assume to have VMware-like virtual machines running all sort of 
different OSes on Brutus.

But it's also critical for Java too... kaffe is showing all sort of 
weaknesses in portability of java code across platforms... I'm sure we 
might find such weaknesses even more exposed by running them on 
different architectures (hmmmm, makes me think that not all modules run 
on all operating systems... hmmmm, this requires a model change)

> I like that goal. We can't assume one host can
> do all builds (although Brutus is doing a fine fine job) so perhaps we could
> allow different hosts to build and contribute data for individual aspects.

That was my thinking.

> Maybe this is a goal to work towards, not focus on now, but I beleive that
> "project id" including a host are not correct (they ought be independent of
> the host) 

Well, I completely agree that the project IDs should be independent on 
the host where they are built, but I think that in order to have global 
uniqueness, we need to have the IDs tied to something that identifies 
their provenance.

I would welcome project IDs of the form

  http://www.apache.org/projects/cocoon

and then

  http://www.apache.org/projects/cocoon#v1.0

for a particular released version, or

  http://www.apache.org/projects/cocoon#20041210

for a particular packaged snapshot of a project built at a particular 
time (note: NOT for the gump builds, those *need* to be identified with 
the host they originated from!)

> [Q: Are we comfortable with allowing remote hosts to connect to a
> center MySQL database, or do we need an intermediary representation and more
> secure protocol for such?]

MySQL has a triple authentication scheme, which I like very much.

Assuming that you use MySQL in networking mode *and* that you bind to 
0.0.0.0 (if you just bind to 127.0.0.1 or if you have networking 
disabled, you can only connect from localhost)

First, it checks the IP of the machines connecting. If the machine is 
not listed in the allowed hosts, the connection is dropped. DoS attacks 
are still possible but the operating of dropping the connection is 
pretty fast so it would saturate the bandwidth before achieving any 
damage to the machine itself (there are way worse DoS attacks you can do 
already than this, so the risk is pratically zero hero).

Also, given the use of MySQL around, I'm sure that an eventual 
buffer-overflow bug in that check would be reported and fixed in no time 
and would make so much noise that we'll hear it even if we were all on 
vacation ;-)

Second, if the IP is listed, it asks for a username and password. If the 
two matches, the user is allowed in.

At this point, the user is used to lookup the priviledges. It is 
possible to define such a granular priviledge system that a particular 
user is able only to perform a particular tipe of query on a particular 
table.

For example, we can allow hosts to perform "inserts" but not "updates". 
This means that even if an offender gets control of a machine that 
performs gump builds, it can only "add" some defective data but not 
modify the data that the host already dumped in the repository, 
preserving the validity of history for those tables (and, once we 
identify the intrusion, we can easily "cleanup" the database just by 
removing the data from that particular host from that point in time on...).

Note that only the "time modelling" tables will be open for 'insertion' 
from the outside. Those tables that don't model time (hosts, workspaces, 
projects, modules) will be maintained by *US* since, if damaged, it 
wouldn't be possible to "roll back" automatically.

The "granting priviledges" operation on mysql is trivial and can be 
performed with SQL queries directly.

> Do we need environment, i.e, kaffe or JDK 1.5 or  whatever?

Yes, we do, and they are identified by the "packages"... keep in mind 
that we might decide to build kaffe before building bootstrap-ant ;-)

This creates a problem though: we said we don't want per-workspace 
dependencies, but if we want to build kaffe and then being able to run 
bootstrap-ant with it, we need to be able to say so... one thing that 
comes to mind is to use "polymorphic" dependencies... which is the same 
thing that Debian does with "virtual packages".

Hmmmm

> Ought we have
> hosts/workspaces as mainly informational, with environment (what ought be
> the only differentiator for two builds of the same stuff, at exact time) as
> the key to builds?

This works for java, but wouldn't work for a general build.

> Do we need to allow "build output" to be optionally outside of the database,
> for those of us w/o terrabytes to spare?

We can get the gump database hosted and maintained over at 
ayax.apache.org which has a few terabytes of disk space ;-)

> I like "dependency" within the database, but do we need more information
> (such as optional, etc.) on that?

Yeah, good point the "type" of the dependency is needed.

> Also, one key piece of information in the current object model (which is
> used to document from) is "cause". We didn't build this thing 'cos X failed
> to build. That, along with annotations (we build this, but w/o X 'cos it was
> an optional failed dependency), seem important. Personally I like all the
> information on this page being available.
> 
>     http://brutus.apache.org/gump/public/ant/ant/details.html

Well, my strategy in building the database design was that duplication 
of information should be zero, everything else should be inferred from 
the model.

Since it is entirely possible to infer the "cause" of a build simply by 
asking a query to the database, that information should not be contained 
explicitly.

The same thing can be said for the "percentage" of the failures, the FOG 
factors and all those things.

This is the biggest problem that I have with today's gump historical 
database: it's mainly a dump of the "post-processing" of today's gump 
logic... if I want to recalculate fog factors based on a different 
heuristics, I'm screwed, becaues only the result were saved, the 
operands were lost!

Note, since the FOG queries are expensive, those will be cached by 
Dynagump and eventually placed into another, temporary database, but 
it's important that we understand that the principle of our model design 
is that no "heuristics" should be placed, only facts.

And "cause" estimation, as factual as it appears to be, it's still a 
heuristic judgement.

> Maybe (as a transition) we generate simple pages from the existing object
> model, but generate a results database (with history) and migrate more and
> more to it over time.

Sure, I don't mind that.

> Thanks, both, for putting this together.

You're welcome.

It's actually a lot of fun :-)

-- 
Stefano.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by "Adam R. B. Jack" <aj...@apache.org>.

> Since I received no pushback on my proposal, let's move on discussing
> the database model.

I see this model is good enough for certain aspects of the proposed 3.0, but
not for all. We can't store the metadata in it, in order to perform builds
from, there is clearly insufficient information. That said, I am more than
happy to start on a 3.0 break-up by splitting the outputs from the
presentation of those outputs via this model.

That said, I still need more information on the contents of ids (and such),
to verify the model is correct. Here are some initial reactions:

One thing I noticed you mentioned was a desire for this database model to
allow Gump to be distributed. I like that goal. We can't assume one host can
do all builds (although Brutus is doing a fine fine job) so perhaps we could
allow different hosts to build and contribute data for individual aspects.
Maybe this is a goal to work towards, not focus on now, but I beleive that
"project id" including a host are not correct (they ought be independent of
the host) [Q: Are we comfortable with allowing remote hosts to connect to a
center MySQL database, or do we need an intermediary representation and more
secure protocol for such?]

Do we need environment, i.e, kaffe or JDK 1.5 or  whatever? Ought we have
hosts/workspaces as mainly informational, with environment (what ought be
the only differentiator for two builds of the same stuff, at exact time) as
the key to builds?

Do we need to allow "build output" to be optionally outside of the database,
for those of us w/o terrabytes to spare?

I like "dependency" within the database, but do we need more information
(such as optional, etc.) on that?

Also, one key piece of information in the current object model (which is
used to document from) is "cause". We didn't build this thing 'cos X failed
to build. That, along with annotations (we build this, but w/o X 'cos it was
an optional failed dependency), seem important. Personally I like all the
information on this page being available.

    http://brutus.apache.org/gump/public/ant/ant/details.html

Maybe (as a transition) we generate simple pages from the existing object
model, but generate a results database (with history) and migrate more and
more to it over time.

Thanks, both, for putting this together.

regards

Adam


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by Stefano Mazzocchi <st...@apache.org>.

Stefano Mazzocchi wrote:
> Stefano Mazzocchi wrote:
> 
>> Since I received no pushback on my proposal, let's move on discussing 
>> the database model.
>>
>> I think the first step is to identify the entities that we want to 
>> model, their relationships and their respective cardinality.
>>
>> Here is what Leo and I came up with so far (attached as PDF).
>>
>> Comments/criticism/questions appreciated.
> 
> 
> Hmmm, trying again.

Damn, it seems that my attachments get filtered out. All right, find it 
over here:

   http://tinyurl.com/4qt9a

-- 
Stefano.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by Brett Porter <br...@gmail.com>.

must be stripping attachemnts - maybe it can be put on the wiki or something?


On Wed, 08 Dec 2004 19:35:18 -0500, Stefano Mazzocchi
<st...@apache.org> wrote:
> Stefano Mazzocchi wrote:
> 
> 
> > Since I received no pushback on my proposal, let's move on discussing
> > the database model.
> >
> > I think the first step is to identify the entities that we want to
> > model, their relationships and their respective cardinality.
> >
> > Here is what Leo and I came up with so far (attached as PDF).
> >
> > Comments/criticism/questions appreciated.
> 
> Hmmm, trying again.
> 
> --
> Stefano.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
> For additional commands, e-mail: general-help@gump.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by Stefano Mazzocchi <st...@apache.org>.

Stefano Mazzocchi wrote:
> Since I received no pushback on my proposal, let's move on discussing 
> the database model.
> 
> I think the first step is to identify the entities that we want to 
> model, their relationships and their respective cardinality.
> 
> Here is what Leo and I came up with so far (attached as PDF).
> 
> Comments/criticism/questions appreciated.

Hmmm, trying again.

-- 
Stefano.

Re: [RT] Gump 3.0 - Database Model

Posted by Leo Simons <ma...@leosimons.com>.

On 26-05-2005 17:58, "Adam R. B. Jack" <aj...@apache.org> wrote:
> Could we get back to this thread above (using  http://tinyurl.com/4qt9a to
> get to the attachment) and see where we want to take it?

Yes we can!

It probably still needs a lot of work.

One thing that's wrong with it at the moment is that for example
project_dependencies don't have ids, and they should (because there's
additional information about a dependency, like it being optional, or it
being part of the "root cause" trail for a failure).

> I see that Gump3
> has a schema that does not include some of the additions mentioned in the
> thread.

If you're referring to branches/Gump3/gumpdb and its contents, that's I
believe more evolved than that PDF (last change december 28 vs december 08).

> Also, I'm trying to flesh out DynaGumper (the Gump3 DB plugin) and I'd like
> to populate the run/build information. I think I need project_version ids,
> but I can't figure out how do calculate them. Do I simply use
> http://www.apache.org/projects/{projectname}#20050526 or #HEAD or #gump or
> something?

Take a look at the example data in the sql file.

IIRC a "project_version" is a "project" that is "part of a specific gump
run". Those two need to be combined into an id. So if you have

<project name="blah"...>

And a "public" gump run started on "vmgump.apache.org" at "2005-05-29 at
21:43", your id becomes something like

 vmgump.apache.org:public:200505292143:blah

Ie the current

  http://vmgump.apache.org/gump/public/xml-security/index.html

Is related to

  vmgump.apache.org:public:200505271902:xml-security

Right now, and will be related to

  vmgump.apache.org:public:200505281902:xml-security
                                  ^^ new run

Tomorrow. At least that's how Stefano set it up, using "semi-URIs". I
would've probably prefixed everything with "urn:gump:" :-)

> Further, ought project dependencies (in project_dependencies) be
> between project versions not projects?

Project A is linked against the Project B compiled on a specific host as
part of a specific run. So yes, a project_version depends on another
project_version. I know the SQL gets this right.

Of course, there's also a declaration like this

 xml-security -depends-> xml-xalan

But that declaration tends to mutate over time.

> Finally, is anybody able to take on the DynaGump Cocoon webapp? I think we'd
> all benefit from seeing inside the database as we populate it.

Probably not atm. Stefano's rather busy building shiny tools AFAICT; we
don't have that many cocooners around I think :-)

Gump data visualisation is a hard problem. For now, you could also consider
writing some trivial commandline scripts that dump out some data to the
console :-)

Cheers!

LSD

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by "Adam R. B. Jack" <aj...@apache.org>.

Could we get back to this thread above (using  http://tinyurl.com/4qt9a to
get to the attachment) and see where we want to take it? I see that Gump3
has a schema that does not include some of the additions mentioned in the
thread.

Also, I'm trying to flesh out DynaGumper (the Gump3 DB plugin) and I'd like
to populate the run/build information. I think I need project_version ids,
but I can't figure out how do calculate them. Do I simply use
http://www.apache.org/projects/{projectname}#20050526 or #HEAD or #gump or
something? Further, ought project dependencies (in project_dependencies) be
between project versions not projects?

Finally, is anybody able to take on the DynaGump Cocoon webapp? I think we'd
all benefit from seeing inside the database as we populate it.

regards,

Adam
----- Original Message ----- 
From: "Stefano Mazzocchi" <st...@apache.org>
To: "Gump" <ge...@gump.apache.org>
Sent: Wednesday, December 08, 2004 6:32 PM
Subject: [RT] Gump 3.0 - Database Model


> Since I received no pushback on my proposal, let's move on discussing
> the database model.
>
> I think the first step is to identify the entities that we want to
> model, their relationships and their respective cardinality.
>
> Here is what Leo and I came up with so far (attached as PDF).
>
> Comments/criticism/questions appreciated.
>
> -- 
> Stefano.
>
>
>


----------------------------------------------------------------------------
----


> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
> For additional commands, e-mail: general-help@gump.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by Wa...@Lawson.com.

Leo:


Leo Simons <ls...@jicarilla.org> wrote on 12/16/2004 02:46:14 AM:

> [...]
> 
> yep, I seem to agree. Let's first implement the proposed setup and 
> optimize for understandability and cleanliness. Gump has a lot of 
> features already. Let's first focus on making the important ones easier 
> to use, then on making it easy to add the ones we want.

I totally agree.  Take the incremental approach in your implementation,
design a bit beyond your current needs (but not too much).  Seems to
remind me of an Einstein quote...  ;)


> I can't really "see through" Wade's setup right now (I'd like to see 
> more, it sounds very interesting :-D), but what I do have is a hunch is 
> addresses quite a few use cases (like redistribution of stuff) which we 
> really don't want to worry about right now.

One significant difference, a differing requirement, between (my) Build
Results system and Gump 3.0 would be that Build Results really consumes
the output of several build systems, some nightly, some some continuous
integration, etc., and we're planning on adding-in CruiseControl as well.
It is the common point where all this information comes together.  There
was just too much legacy stuff (here) to attack all at once, so instead,
this approach seemed the more practical.  And now we are able to leverage
off nof it in different ways (with new build loops, like CruiseControl,
where there's already a "publisher" interface).

wade

Re: [RT] Gump 3.0 - Database Model

Posted by Leo Simons <ls...@jicarilla.org>.

Stefano Mazzocchi wrote:
> Wade.Stebbings@Lawson.com wrote:
> 
>> Yes, it is a many-to-many relation between the Project and Group tables.
>> Thus, I can define one group which is all "mainline" builds (we have
>> several release streams managed by separate branches), regardless of
>> platforms build on.  Another group would be all Windows/2003 builds.  It
>> is merely a way of seeing a limited set of project names, though when
>> presented on the web page, I do also display some project attributes for
>> each project displayed, like the lable & link of the current build as 
>> well as the "last good" build.
> 
> Ok, I see.
> 
> What I was thinking is that this (and other of your suggestions) adds a 
> "meta-metadata" layer and I'm not sure if I want to add this complexity 
> at this point (given that the model is complex enough already).
> 
> I agree that this meta-metadata layer will be very useful (for 
> annotation, grouping and further user interaction around the collected 
> data) but this is something we can add incrementally later on.

yep, I seem to agree. Let's first implement the proposed setup and 
optimize for understandability and cleanliness. Gump has a lot of 
features already. Let's first focus on making the important ones easier 
to use, then on making it easy to add the ones we want.

I can't really "see through" Wade's setup right now (I'd like to see 
more, it sounds very interesting :-D), but what I do have is a hunch is 
addresses quite a few use cases (like redistribution of stuff) which we 
really don't want to worry about right now.

cheers,

- Leo

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by Wa...@Lawson.com.

Stefano:

  see my responses below.

wade


Stefano Mazzocchi <st...@apache.org> wrote on 12/15/2004 09:40:28 PM:

> [...]
> 
> What I was thinking is that this (and other of your suggestions) adds a 
> "meta-metadata" layer and I'm not sure if I want to add this complexity 
> at this point (given that the model is complex enough already).
> 
> I agree that this meta-metadata layer will be very useful (for 
> annotation, grouping and further user interaction around the collected 
> data) but this is something we can add incrementally later on.

Yes.  This is a very easy thing to add-on later, "over the top" so to
speak, as none of the inner workings depend on it.  It is purely a way
to organize projects for presentation purposes.  Meta-meta?  Sure, why
not call it that.



> [...]
> 
> Ok, this is again another meta-metadata layer but this is something that 

> I'm not sure I like. It smells of overdesign and at this point I want to 

> keep features that are just critical for having the system working. "the 

> simplest thing that can possibly work".

Understood.  It is probably something more useful within my environment,
which is based on several different build systems that feed this system.


> [...]
> 
> Keep in mind that we DO NOT WANT gump to build anything that anybody 
> would start use for their own stuff. It is critical, socially and 
> politically and for the security ecosystem that gump's artifact 
> repository is not used for anything else rather than distributed gumping 

> and fallback scenarios.
> 
> Consider it a cache, a repository of "precomputed calculations" rather 
> than anything else.
> 
> This is true for executables: for javadocs and docs, this is a different 

> story but we should not attack too many problems at the same time.

I see.  Our requirement was more broad for the Artifact Repository, and
thus it is "overloaded" to serve the build system itself (more gump like)
as well as internal (to the company) users for certain artifacts.  This
notion of an Artifact Repository is not very well fleshed out at the
moment, here, it is mostly design ideas at the present time.  We have
some pieces in place, mostly in a crude way.


> [...]
> > 
> > In fact, at present in my schema, for a single build table entry,
> > there can be:
> > 
> >  - any number of notes
> >  - any number of artifacts
> >  - any number of results
> 
> This is interesting. How can you have different numbers of results if 
> you have only one output signal for a given build?

Ah, that all depends on how 'result' is defined.  As a "Build Results"
system, in my case, it serves more than just to feed the build system
proper.  Thus, as an example:

 1. Building (e.g., compiling) --> one result
 2. Packaging --> 2nd result
 3. First level automated testing (eg., unit) --> 3rd result
 4. QA testing --> 4th result
...
 N. Overall

There is usually a fixed set of "result types" on a per project
basis, some projects might not bother with "QA testing" for an
example, some might fold packaging into the build proper, etc.
This is all very dynamic, of course, because a new type could be
added one day and then live on, another type could be phased out.
The presentation is setup to handle all this.

The one output signal to which you refer is probably #1, else it
is #N.  In my case, N is calculated and the calculation is again
a per-project parameter.  This might seem like unnecessary
overdesign to Gump, but there are reasons why this is needed
here--actually, plenty.

The main points being:

 - although the build system produces artifacts, and in doing
   so there is status about that activity, thus one type of
   build result,

 - there are things we learn about those artifacts after they
   are produced, thus more results.

Bit of background on me, hopefully not to bore anyone.  I am in a
business environment, now, but came from years of cross-development
builds, embedded systems, etc.  To me, a build (proper) produces a
"stream of bytes" which most people call artifacts.  That stream of
bytes is further qualified as time goes on, usually in a series of
steps, and each step I have defined here as a result.  Many complex
systems have added new twists where "build tooling" itself is
produced during the build process, to which I try to decompose into
separate builds, where subsequent builds then become consumers of
another build's artifacts--baselined to some level of goodness, one
would hope.  Not that I'm saying anything really new here, except
about my perspective on things.

wade

Re: [RT] Gump 3.0 - Database Model

Posted by Stefano Mazzocchi <st...@apache.org>.

Wade.Stebbings@Lawson.com wrote:
> Stefano:
> 
> See my responses below.
> 
> 
> Stefano Mazzocchi <st...@apache.org> wrote on 12/10/2004 02:21:48 PM:
> 
>>Wade.Stebbings@Lawson.com wrote:
>>[...]
>>
>>>In my Build Results system, I have a schema that also includes a few
>>>additional things:
>>>
>>> - abritrary groupings of projects, which helps in organizaing various
>>>    forms of the presentation of the data
>>
>>Can you elaborate more on this?
> 
> 
> Yes, it is a many-to-many relation between the Project and Group tables.
> Thus, I can define one group which is all "mainline" builds (we have
> several release streams managed by separate branches), regardless of
> platforms build on.  Another group would be all Windows/2003 builds.  It
> is merely a way of seeing a limited set of project names, though when
> presented on the web page, I do also display some project attributes for
> each project displayed, like the lable & link of the current build as 
> well as the "last good" build.

Ok, I see.

What I was thinking is that this (and other of your suggestions) adds a 
"meta-metadata" layer and I'm not sure if I want to add this complexity 
at this point (given that the model is complex enough already).

I agree that this meta-metadata layer will be very useful (for 
annotation, grouping and further user interaction around the collected 
data) but this is something we can add incrementally later on.

>>> - the general notion of "attributes" associated with each:
>>>    - build (instance)
>>>    - project
>>>    - group
>>>    - the whole system
>>
>>"attributes" as in "annotations" or as in "related data"?
> 
> I'm not sure what is the difference between annotations (like added
> noted, do you mean?) and "related data".  But what these tables do
> is, basically to allow one to add new fields to the associated table
> without a schema change.  They are name/value pairs, with the added
> key (foreign key) of the id of the table to which they relate.  Thus,
> for the Project table:
> 
>   Project Attributes:
>     - proj_id (foreign key)
>     - name (string, key)
>     - value (blob)
> 
>   Project:
>     - proj_id (key)
>     - ... etc ...
> 
> In the case of the system wide attributes table, there is no "id"
> field.  That table I use for stuff like debug on/off/level, motd,
> and so far little else.

Ok, this is again another meta-metadata layer but this is something that 
I'm not sure I like. It smells of overdesign and at this point I want to 
keep features that are just critical for having the system working. "the 
simplest thing that can possibly work".

>>>And since my system is focused on creating interaction between people
>>>about given built baselines, I have the notion of a notes history 
>>>associated
>>>with any given build, in a similar spirit as the comment history of a 
>>>given
>>>"bug" in bugzilla.
>>
>>I like the concept of allowing bugzilla-style communication to happen 
>>without requiring people to subscribe to various mail lists, like a 
>>common ground for communication to happen.
>>
>>But I don't want this to be too global, because I want gump-related 
>>discussions to happen on the mail list.
> 
> You could tie-in email notification when this table is updated.  We
> don't do that, but it's not a bad idea.  Bugzilla of course does this.

Good suggestion. Again, this applies to the meta-metadata layer but it 
strikes me as a very useful feature to have right away. What do others 
think?

>>>Like the notes table, I have separate tables for (references to) 
>>>artifacts,
>>
>>yes, the artifact table is missing, that's a good point.
> 
> 
> I use the notion of an external Artifact Repository and refer into
> that with this table.  The artifacts themselves are not stored in
> the database nor on the database server.  Just wanted to be clear
> about that.
> 
> The notion of an Artifact Repository: ah, well, I have my idea of
> what I want, and then there's the reality that we don't have much
> more (at present) than a web-based storage mechanism, organized
> hierarchically within the file system.  Thus "version information"
> is exposed in the file-path name space, and 3rd party artifacts
> are managed in yet another system.  My notion of an Artifact
> Repository would be a place to store any 3rd party artifact that
> any build could depend on.  Build themselves would be producers,
> but could also be consumers.  One of the main points of this is:
> that I separate, architecturally, the Artifact Repository, as a
> separate service, from the build system itself.

Keep in mind that we DO NOT WANT gump to build anything that anybody 
would start use for their own stuff. It is critical, socially and 
politically and for the security ecosystem that gump's artifact 
repository is not used for anything else rather than distributed gumping 
and fallback scenarios.

Consider it a cache, a repository of "precomputed calculations" rather 
than anything else.

This is true for executables: for javadocs and docs, this is a different 
story but we should not attack too many problems at the same time.

>>>and another for results, to support any arbitrary number of 
>>>artifacts/results
>>>to a given build-instance. 
>>
>>Good point.
>>
>>[...]
>>
>>So, things missing are:
>>
>>  1) bugzilla like comments (on build results only? or what else?)
>>  2) artifact table / artifact type table
>>
>>Anything else you guys see missing?
> 
> 
> Note: The results per build table (to support an arbirary number
> of results per build) was a separate table from the artifact table.
> 
> In fact, at present in my schema, for a single build table entry,
> there can be:
> 
>  - any number of notes
>  - any number of artifacts
>  - any number of results

This is interesting. How can you have different numbers of results if 
you have only one output signal for a given build?

[snip color coding]

> Thanks for including me in the discussion.  I look forward to more.

me too! :-)

-- 
Stefano.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by Stefano Mazzocchi <st...@apache.org>.

Wade.Stebbings@Lawson.com wrote:
> Stefano,
> 
> Some afterthoughts.  Hopefully to help clarify.  The scope of a "Project"
> in our system (currently) is that of a build (a series of builds) for a 
> given
> instance of (1) product-release on a given (2) target.  This of course
> means that a single configuration for a given instance of #1 would then
> "fan out" to several "Projects" (as we have used this word).
> 
> I am not completely happy with this arrangement, since our "Project"
> does not distinguish between:
> 
>  (a) separate configurations, or
>  (b) the same configurations build on different targets.
> 
> And somehow I think this distinction should be more clearly represented
> in the data model.
> 
> I think if (1) were to be defined as the "Project" and the (2)'s under it
> would be "SubProject" (to use some names), and keep the arbitrary
> grouping mechanism, though now at the SubProject level, then I think
> we've gained something w/o any other feature loss.

I'm sorry, I totally lost you here :-/
-- 
Stefano.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by Wa...@Lawson.com.

Stefano,

Some afterthoughts.  Hopefully to help clarify.  The scope of a "Project"
in our system (currently) is that of a build (a series of builds) for a 
given
instance of (1) product-release on a given (2) target.  This of course
means that a single configuration for a given instance of #1 would then
"fan out" to several "Projects" (as we have used this word).

I am not completely happy with this arrangement, since our "Project"
does not distinguish between:

 (a) separate configurations, or
 (b) the same configurations build on different targets.

And somehow I think this distinction should be more clearly represented
in the data model.

I think if (1) were to be defined as the "Project" and the (2)'s under it
would be "SubProject" (to use some names), and keep the arbitrary
grouping mechanism, though now at the SubProject level, then I think
we've gained something w/o any other feature loss.

wade


Wade.Stebbings@Lawson.com wrote on 12/13/2004 09:07:32 AM:

> Stefano:
> 
> See my responses below.
> 
> 
> [...]

Re: [RT] Gump 3.0 - Database Model

Posted by Wa...@Lawson.com.

Stefano:

See my responses below.

Stefano Mazzocchi <st...@apache.org> wrote on 12/10/2004 02:21:48 PM:
> Wade.Stebbings@Lawson.com wrote:
> [...]
> > In my Build Results system, I have a schema that also includes a few
> > additional things:
> > 
> >  - abritrary groupings of projects, which helps in organizaing various
> >     forms of the presentation of the data
> 
> Can you elaborate more on this?

Yes, it is a many-to-many relation between the Project and Group tables.
Thus, I can define one group which is all "mainline" builds (we have
several release streams managed by separate branches), regardless of
platforms build on.  Another group would be all Windows/2003 builds.  It
is merely a way of seeing a limited set of project names, though when
presented on the web page, I do also display some project attributes for
each project displayed, like the lable & link of the current build as 
well as the "last good" build.

> >  - the general notion of "attributes" associated with each:
> >     - build (instance)
> >     - project
> >     - group
> >     - the whole system
> 
> "attributes" as in "annotations" or as in "related data"?

I'm not sure what is the difference between annotations (like added
noted, do you mean?) and "related data".  But what these tables do
is, basically to allow one to add new fields to the associated table
without a schema change.  They are name/value pairs, with the added
key (foreign key) of the id of the table to which they relate.  Thus,
for the Project table:

  Project Attributes:
    - proj_id (foreign key)
    - name (string, key)
    - value (blob)

  Project:
    - proj_id (key)
    - ... etc ...

In the case of the system wide attributes table, there is no "id"
field.  That table I use for stuff like debug on/off/level, motd,
and so far little else.

> > And since my system is focused on creating interaction between people
> > about given built baselines, I have the notion of a notes history 
> > associated
> > with any given build, in a similar spirit as the comment history of a 
> > given
> > "bug" in bugzilla.
> 
> I like the concept of allowing bugzilla-style communication to happen 
> without requiring people to subscribe to various mail lists, like a 
> common ground for communication to happen.
> 
> But I don't want this to be too global, because I want gump-related 
> discussions to happen on the mail list.

You could tie-in email notification when this table is updated.  We
don't do that, but it's not a bad idea.  Bugzilla of course does this.

> > Like the notes table, I have separate tables for (references to) 
> > artifacts,
> 
> yes, the artifact table is missing, that's a good point.

I use the notion of an external Artifact Repository and refer into
that with this table.  The artifacts themselves are not stored in
the database nor on the database server.  Just wanted to be clear
about that.

The notion of an Artifact Repository: ah, well, I have my idea of
what I want, and then there's the reality that we don't have much
more (at present) than a web-based storage mechanism, organized
hierarchically within the file system.  Thus "version information"
is exposed in the file-path name space, and 3rd party artifacts
are managed in yet another system.  My notion of an Artifact
Repository would be a place to store any 3rd party artifact that
any build could depend on.  Build themselves would be producers,
but could also be consumers.  One of the main points of this is:
that I separate, architecturally, the Artifact Repository, as a
separate service, from the build system itself.

> > and another for results, to support any arbitrary number of 
> > artifacts/results
> > to a given build-instance. 
> 
> Good point.
> 
> [...]
> 
> So, things missing are:
> 
>   1) bugzilla like comments (on build results only? or what else?)
>   2) artifact table / artifact type table
> 
> Anything else you guys see missing?

Note: The results per build table (to support an arbirary number
of results per build) was a separate table from the artifact table.

In fact, at present in my schema, for a single build table entry,
there can be:

 - any number of notes
 - any number of artifacts
 - any number of results

I separate artifacts (products of a build) from results (meta data
or things we know about or learn about the build products).  A 
result entry has one of four possible states in my schema: 1. unset,
2. pass, 3. warn, 4. fail (to which I map the obvious color in the
web presentation ;) -- extrapolating/generalizing that my sampling
of the world's traffic/semaphore lights extends to the rest of the
world; 7 countries on 4 continents - a good but small sample).  And
unset = white.

Thanks for including me in the discussion.  I look forward to more.

wade

Re: [RT] Gump 3.0 - Database Model

Posted by Stefano Mazzocchi <st...@apache.org>.

Wade.Stebbings@Lawson.com wrote:
> This is cool.  FWIW, here's some bits from my experience, implemeting
> something similar in a MySQL database.

Awesome!

> In my Build Results system, I have a schema that also includes a few
> additional things:
> 
>  - abritrary groupings of projects, which helps in organizaing various
>     forms of the presentation of the data

Can you elaborate more on this?

>  - the general notion of "attributes" associated with each:
>     - build (instance)
>     - project
>     - group
>     - the whole system

"attributes" as in "annotations" or as in "related data"?

> And since my system is focused on creating interaction between people
> about given built baselines, I have the notion of a notes history 
> associated
> with any given build, in a similar spirit as the comment history of a 
> given
> "bug" in bugzilla.

I like the concept of allowing bugzilla-style communication to happen 
without requiring people to subscribe to various mail lists, like a 
common ground for communication to happen.

But I don't want this to be too global, because I want gump-related 
discussions to happen on the mail list.

> Like the notes table, I have separate tables for (references to) 
> artifacts,

yes, the artifact table is missing, that's a good point.

> and another for results, to support any arbitrary number of 
> artifacts/results
> to a given build-instance.  

Good point.

> This could be hidden in your diagram inside 
> the
> "builds" entity/table, but wasn't explicit.

No, you're right, we need to add that.

> I've built a lot of generality into my schema, since I need to support 
> many
> inputs into this database, from various (new and old) build systems.  Thus
> things like the result table is kept very general within the database. One
> area that is not very well thought out (in my case) are how results and/or
> build instances depend on each other, a core requirement for Gump, as
> it seems.
> 
> Hope this helps.  Comments?

So, things missing are:

  1) bugzilla like comments (on build results only? or what else?)
  2) artifact table / artifact type table

Anything else you guys see missing?

-- 
Stefano.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] Gump 3.0 - Database Model

Posted by Wa...@Lawson.com.

This is cool.  FWIW, here's some bits from my experience, implemeting
something similar in a MySQL database.


In my Build Results system, I have a schema that also includes a few
additional things:

 - abritrary groupings of projects, which helps in organizaing various
    forms of the presentation of the data

 - the general notion of "attributes" associated with each:
    - build (instance)
    - project
    - group
    - the whole system

And since my system is focused on creating interaction between people
about given built baselines, I have the notion of a notes history 
associated
with any given build, in a similar spirit as the comment history of a 
given
"bug" in bugzilla.

Like the notes table, I have separate tables for (references to) 
artifacts,
and another for results, to support any arbitrary number of 
artifacts/results
to a given build-instance.  This could be hidden in your diagram inside 
the
"builds" entity/table, but wasn't explicit.

I've built a lot of generality into my schema, since I need to support 
many
inputs into this database, from various (new and old) build systems.  Thus
things like the result table is kept very general within the database. One
area that is not very well thought out (in my case) are how results and/or
build instances depend on each other, a core requirement for Gump, as
it seems.

Hope this helps.  Comments?

wade


Stefano Mazzocchi <st...@apache.org> wrote on 12/08/2004 06:32:34 PM:

> Since I received no pushback on my proposal, let's move on discussing 
> the database model.
> 
> I think the first step is to identify the entities that we want to 
> model, their relationships and their respective cardinality.
> 
> Here is what Leo and I came up with so far (attached as PDF).
> 
> Comments/criticism/questions appreciated.
> 
> -- 
> Stefano.