You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@gump.apache.org by Stefano Mazzocchi <st...@apache.org> on 2004/12/07 00:13:28 UTC
[RT] Gump 3.0
I've been working for a while to describe an improved architecture for
Gump and I have decided to "go public" with the discussion because I
want this to be a community effort.
- o -
First and foremost, I believe that gump is one of the most exiting
things happening that ever happened in the software space over the last
few years but I also thinks that both technical, architectural and
social limitations are stopping it from exihibit its real potential.
The biggest problem I have is the fact that gump is such an integrated
system: it tries to do too much in one single stage.
Don't get me wrong: the internals of gump 2.x are rather modular and
well architected, but the overall system architecture is too monolithic.
So, here is my first suggestion: split gump in three stages.
1) metadata aggregation
2) build
3) build data use
- o -
Stage 1: Metadata aggregation
-----------------------------
Gump will socially scale only when the metadata about the problem will
be taken care by the people that administer the project rather then a
few gump meisters.
In this regard, I believe Maven to be far superior in term of
gump-friendliness than ant because of its complete declarative nature
(ant builds are a functional language, where project metadata cannot be
transparently be inferred from them).
In a perfect world, all project would *need* an metadata representation
of their structure so that a build tool can parse that and understand
what the project needs.
In the real world, there are two camps:
1) procedural: make,configure,sh,ant
2) declerative: maven,apt-get,ports
and the second normally build on the first one.
The absolute need for gump (or apt-get, or BSD ports) is to have a
"declarative" layer on top of the "procedural" one for every project, a
'semantic' layer that the system can understand and work on.
Debian shows that it's possible to socially scale the concept of adding
a semantic layer on top of existing project efforts, in a completely
independent fashion.
Maven shows that it's possible for the projects themselves to make good
use of this information (also calling ant, if special needs are required).
For gump, what's important is that having maven generate gump
descriptors is both stupid and inefficient: gump should be able to
digest directly the maven POM, without requiring any effort from the
project.
We should be maintaing the metadata representation only for the projects
that don't have that data integrated in their build system (like pure
ant projects or make/configure projects).
So, what is a metadata aggregation layer?
It's a crawler for project metadata. Crawls project and their
descriptors and aggregates them in a service that can be queried to
obtain that information.
In short
[bunch of locations] --> crawler --> metadata database
- o -
Stage 2: Build
--------------
This is what today we think as "gump". In short, it's the service that
uses the project metadata, does the fetching, preparing, building and
generates a bunch of data as a result.
The difference from today's gump is that this "build-only gump" outputs
data into a database, not into HTML pages or RSS scripts. The build
stage and the data use stage are separated.
In short:
metadata database --> gump --> build data database
- o -
Stage 3: Build Data Use
-----------------------
This is what todays is performed by the 'actors' inside Gump 2.x, the
current actors are:
1) document
2) repository
3) notify
4) stats
5) syndication
6) timing
7) rdf
8) mysql
9) results
we could aggregate them in the following taxonomy:
[web]
[html]
document -> creates the forrest output
results -> creates the XHTML output
stats -> does the stats part
timing -> does the timing part
[others]
syndication -> does the RSS feeds
RDF -> does the RDF descriptors
[email]
notify -> notifies the mail lists
[history]
mysql -> saves historical data
repository -> saves the built jar files
My suggestion is to remove all those away from the stage 2 and just let
the "historical" actors be in stage 2 (basically pumping all the data
into the historical database) and let the others reside in stage 3.
So, for stage 3 I see two possible services:
1) the web service, taking care of things like:
- web pages
- historical graphs
- syndication of results
2) the notification service, taking care of sending emails to the
various projects
In short:
metadata database --+ +--> email notifier
+--+
build data database --+ +--> webapp
- o -
Advantages
----------
This new architecture has several advantages:
1) the concerns are more easily separated, also means that different
stages can be built using different languages. The webapp, for example,
that I'm working working on (codename 'dynagump' and located in
http://svn.apache.org/repos/asf/gump/dynagump/trunk) is a Cocoon
application.
2) by decoupling the architecture, it's easier to have multiple
machines running the second stage in parallel (both controlled by us or
simply donated by the users) for example
--- Debian on x86 ---
/ \
/ v
metadata database ---- MacOSX on PPC ---> build data database
\ ^
\ /
--- WinXP on x86 ----
*and* is also easier to install a "build stage" on a given machine,
since the metadata bootstrap phase should be done automatically. for
example, it should be sufficient to say "gump build asf:cocoon" in order
to the whole system to be prepared and packaged and ready to go.
3) also by allowing gump to adapt the existing descriptors into a
database form, it's easier to empower users by either allowing them to
maintain their data in the original form (ie. Maven descriptors) or to
adapt/modify the data in the database directly (for example, thru a web
application).
4) the contracts between the stages are databases, once these models
are codified, it's possible for the three stages to work in complete
isolation, without affecting one another.
Comments?
--
Stefano.
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org
Re: [RT] Gump 3.0
Posted by Stefano Mazzocchi <st...@apache.org>.
Adam R. B. Jack wrote:
> Ok, here is my thinking on how we proceed towards Gump 3.0, i.e.:
>
> 1) Metadata Gathering
> 2) Processing (Build/Sync/Update)
> 3) Results/Presentation/History Query/Analysis
>
> ------------------------------------------------------------------------
> Fnor *now* ...
>
> 1) Phase One (Metadata Gathering) is simply the way to get XML documention
> into a local file system for Gump to process. Eventually this could be
> crawlers (etc.) that parse GOMs and POMs, but (for now) the CVS update &
> HTTP gets are tolerable. [If anybody has an itch to tackle this first, speak
> up, but I think it is a reasonable/significant amount of work and (IMHO) can
> wait a little while longer.]
+1
> 2) Phase Two (Building) is what we currently have as core, but that outputs
> to an historical database (plus some files for those w/o huge databases). It
> will not do RDF/RSS/Atom/Notification/XHTML Presentation (or XDOCS). It will
> not do Stats (neither XHTML presentation nor internal to DBM) nor will it do
> XRef (XHTML).
+1
> 3) Phase Three (Analysis/Communication) is a whole new world; re-writting
> the 'will not do' list from above from the results database. This could be
> Python code, or Cocoon, or ...
>
> I'd like to focus my time on (2) and request that others help with (3).
I'm game. I can take ownership of #3.
> Question: We currently run JDK1.5 and Kaffe off TRUNK not LIVE. Ought we
> change this?
yeah, it makes sense.
> Alternatively, ought we perform this Gump work in a separate
> branch. I think I can add to the current w/o too much instability, then
> remove stuff when needed. I'm game to listen to others opinions/concerns
> though.
Currently, Dynagump is the code name for "#3" and does not depend on any
code from Gump (only on a common database schema).
I think we keep it the way it is for now, we can move stuff back and
forth later on, thanks to SVN.
> [FWIIW: Personally, I'd love to get back to NAnt building except that Mono
> is still my roadblock.I think Gump 3.0 ought be far less resource bound, and
> it ought help us simplify running/operating Gump. As such, I hope it leads
> to more users and hence more hands to help with NAnt, etc.]
I personally would love to see Mono stuff being gumped as well, but it's
a low priority for me ATM.
--
Stefano.
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org
Re: [RT] Gump 3.0
Posted by Stefan Bodewig <bo...@apache.org>.
On Fri, 10 Dec 2004, Adam R. B. Jack <aj...@apache.org> wrote:
> [FWIIW: Personally, I'd love to get back to NAnt building except
> that Mono is still my roadblock.
I still don't quite understand why it works far better on my oldish
RedHat box either. Hmm, have we tried Mono 1.0.4 or even 1.0.5
(released today 8-) yet?
Anyway. Once I merge my lst commit to the live branch we will build
apr-util against apr and everything should be there to support
configure/make based projects (we may need env variable support). My
next prio will be documenting the stuff so that others like Graham can
get their feet wet - and then head towards NAnt and Mono.
This is what I expect to be able to do, I'll probably never dive into
Python (lack of time - and admittedly it hasn't been fun yet, either)
deep enough in order to scratch more than the surface.
Stefan
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org
Re: [RT] Gump 3.0
Posted by "Adam R. B. Jack" <aj...@apache.org>.
Ok, here is my thinking on how we proceed towards Gump 3.0, i.e.:
1) Metadata Gathering
2) Processing (Build/Sync/Update)
3) Results/Presentation/History Query/Analysis
------------------------------------------------------------------------
Fnor *now* ...
1) Phase One (Metadata Gathering) is simply the way to get XML documention
into a local file system for Gump to process. Eventually this could be
crawlers (etc.) that parse GOMs and POMs, but (for now) the CVS update &
HTTP gets are tolerable. [If anybody has an itch to tackle this first, speak
up, but I think it is a reasonable/significant amount of work and (IMHO) can
wait a little while longer.]
2) Phase Two (Building) is what we currently have as core, but that outputs
to an historical database (plus some files for those w/o huge databases). It
will not do RDF/RSS/Atom/Notification/XHTML Presentation (or XDOCS). It will
not do Stats (neither XHTML presentation nor internal to DBM) nor will it do
XRef (XHTML).
3) Phase Three (Analysis/Communication) is a whole new world; re-writting
the 'will not do' list from above from the results database. This could be
Python code, or Cocoon, or ...
I'd like to focus my time on (2) and request that others help with (3).
Question: We currently run JDK1.5 and Kaffe off TRUNK not LIVE. Ought we
change this? Alternatively, ought we perform this Gump work in a separate
branch. I think I can add to the current w/o too much instability, then
remove stuff when needed. I'm game to listen to others opinions/concerns
though.
[FWIIW: Personally, I'd love to get back to NAnt building except that Mono
is still my roadblock.I think Gump 3.0 ought be far less resource bound, and
it ought help us simplify running/operating Gump. As such, I hope it leads
to more users and hence more hands to help with NAnt, etc.]
regards,
Adam
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org
Re: [RT] Gump 3.0
Posted by "Adam R. B. Jack" <aj...@apache.org>.
> 2a) SCM update
> 2b) syncing updated working copy with workspace
> 2c) building
We do actually have 2a and 2c already, in bin/build.py and bin/update.py,
they just never got the usage/fixing they might need.
regards
Adam
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org
Re: [RT] Gump 3.0
Posted by Stefan Bodewig <bo...@apache.org>.
On Wed, 08 Dec 2004, Stefan Bodewig <bo...@apache.org> wrote:
> On Mon, 06 Dec 2004, Stefano Mazzocchi <st...@apache.org> wrote:
>
>> So, here is my first suggestion: split gump in three stages.
>>
>> 1) metadata aggregation
>> 2) build
>> 3) build data use
>
> Sounds good.
One additional thing.
I'd love to have part 2 separated into at least three steps that can
get invoked indiviually:
2a) SCM update
2b) syncing updated working copy with workspace
2c) building
With "traditional Gump" it has been possible to modify classes in the
workspace and rebuild using Gump. This has been very useful in
resolving Gump problems in the past. Right now I don't see an easy
way to do this.
For example, I "fixed" the commons-jelly-tags-ant build by patching
the jelly-util taglib. I verified it would fix the Gump build by
applying my patch locally and only building commons-jelly-tags-util
and after that commons-jelly-tags-ant.
Using current Gump my local patch would have been blown away by CVS
updates or syncs - unless I applied it in what is supposed to be a
clean checkout and disconnected from the network.
Also, just building commons-jelly-tags-util and commons-jelly-tags-ant
without rebuilding Ant and all that seems to be impossible right now
(I may be wrong, though).
Stefan
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org
Re: [RT] Gump 3.0
Posted by Stefan Bodewig <bo...@apache.org>.
On Mon, 06 Dec 2004, Stefano Mazzocchi <st...@apache.org> wrote:
> So, here is my first suggestion: split gump in three stages.
>
> 1) metadata aggregation
> 2) build
> 3) build data use
Sounds good.
> We should be maintaing the metadata representation only for the
> projects that don't have that data integrated in their build system
> (like pure ant projects or make/configure projects).
Even the later may have them in some form, like RPM spec files, it may
be worth to look into them (some time later) as well.
Stefan
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org
Re: [RT] Gump 3.0
Posted by David Crossley <cr...@apache.org>.
Stefano Mazzocchi wrote:
> I've been working for a while to describe an improved architecture for
> Gump and I have decided to "go public" with the discussion because I
> want this to be a community effort.
It was great to particpate with you and Leo IRL at ApacheCon
over some of this - another apsect of the community effort.
Thanks for going the next step.
[snip]
>
> Comments?
Stefano hits the nails on the head again.
--David
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org
Re: [RT] Gump 3.0
Posted by "Adam R. B. Jack" <aj...@apache.org>.
> Comments?
This says it all for me:
> The biggest problem I have is the fact that gump is such an integrated
> system: it tries to do too much in one single stage.
I don't mind if the "contract"/communication between phases is some RDF
store, or database, or whatever, but I do want to have this separation. We
also need to ensure that (this time) we have the commandline run (Random Joe
running Gump) figured out. It needs to be as easy to do each/any stage
manually as Sam used to find it. Smaller steps might just make that easier
to achieve.
regards,
Adam
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org
Re: [RT] Gump 3.0
Posted by Leo Simons <ls...@jicarilla.org>.
Stefano Mazzocchi wrote:
> Comments?
Not really. Most of it sounds obvious by now, actually :-D
More images related to this architecture are at:
http://svn.apache.org/repos/asf/gump/trunk/src/xdocs/gump.pdf
though I'm afraid some of the comments in the gump.ppt alongside there
didn't make it into the PDF.
I'll also point out that your RT (probably on purpose) leaves out a
*lot* of talk about (lifting) social limitations. The fun bit about the
thinking there is that it tends to span all those stages and database.
That really needs to be written down as well at some point so some of
the design decisions make more sense :-D
Finally I'll point out (just to keep this e-mail short, really, there's
a lot to say), one other thing to realize is that this
DB-based-architecture will help us move away from the batch-based
approach we have right now.
- LSD
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org