You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@gump.apache.org by Leo Simons <ma...@leosimons.com> on 2005/04/16 17:31:10 UTC

[RT] module, project, target = repository, module, project...

Hi gang!

Currently, a <module/> element tends to correspond roughly to the
ant|maven|make definition of a project, and a <project/> element tends to
correspond roughly to the ant|make definition of a target or the maven
definition of a goal. I.e.

  <repository name="ant">
  <module name="ant">
    <project name="bootstrap"/>
    <project name="ant"/>
    <project name="dist"/>
  </module>
  </repository>

It might make sense to make this

  <module name="ant">
  <project name="ant">
    <target name="bootstrap"/>
    <target name="build"/>
    <target name="dist"/>
  </project>
  </module>

However, we've also got stuff like

  <repository name="jakarta">
    <module name="jakarta-commons">
      <project name="jakarta-commons-collections/>
      <project name="jakarta-commons-net"/>
      <project name="jakarta-commons-io"/>
    </module>
  </repository>

Where one would hardly want to make "jakarta-commons" be an actual project.

The more and more I look at this, the more I'm disliking how all this is set
up, esp. as its not very consistent across different projects, and we don't
have very clear guidelines on how people should be doing this (other than
"copy existing practices"). It's a little messy.

I'm tempted to do a radical remodelling of our metadata structure to remove
this kind of ambiguity, even going as far as having conventions like
project-name-is-file-name be gently enforced.

Oh, ehm, I was even briefly tempted to turn our model into RDF but there
ain't that many good tools for RDF editing :-D

Your comments?

Cheers,

LSD



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] module, project, target = repository, module, project...

Posted by "Adam R. B. Jack" <aj...@apache.org>.

> I'm tempted to do a radical remodelling of our metadata structure to
remove
> this kind of ambiguity, even going as far as having conventions like
> project-name-is-file-name be gently enforced.

We are rebuilding Gump from the bottom up, so why not do the same with the
metadata? I'm game for it. I say we create a Gump3 workspace on Brutus to
run the minimum (e.g. up to Ant) and we work and re-work it until we like
it. We can throw in all the "rotton" test cases we like, like Jakarata
Commons and so forth. Once we like it we can migrate the whole set of
metadata, which we could likely script (for 80+%).

> Oh, ehm, I was even briefly tempted to turn our model into RDF but there
> ain't that many good tools for RDF editing :-D

I'm repeating what Iv'e written before, but for my tuppence ... I think
folks are most comfortable with XML, even if RDF good sense as a set of
statements about a module/project/artifact. I say we stick w/ XML, have us
generate RDF triples to match the metadata, and (eventually) allow RDF for
input (when we allow Maven descriptors, etc.)

regards,

Adam


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] module, project, target = repository, module, project...

Posted by Stefan Bodewig <bo...@apache.org>.

On Sat, 16 Apr 2005, Leo Simons <ma...@leosimons.com> wrote:

> Currently, a <module/> element tends to correspond roughly to the
> ant|maven|make definition of a project,

Really?  At least in my mind it still corresponds to a CVS module or
SVN directory.

I really still see <project> in line with the ant|maven|make
definition, conceptually, but we do have problems making it that
because we'd need to run multiple targets.

> It might make sense to make this
> 
>   <module name="ant">
>   <project name="ant">
>     <target name="bootstrap"/>
>     <target name="build"/>
>     <target name="dist"/>
>   </project>
>   </module>

Yes, that would fit my mental model much better, but still, <module>
does not correspond to an Ant build file here.

> I'm tempted to do a radical remodelling of our metadata structure to
> remove this kind of ambiguity, even going as far as having
> conventions like project-name-is-file-name be gently enforced.

Take a look at the nummber of jars created by the dist-ant project.
No, I don't want to create one project for each of them, in particular
since they all get created with a single Ant target.

> Oh, ehm, I was even briefly tempted to turn our model into RDF but
> there ain't that many good tools for RDF editing :-D
> 
> Your comments?

Feel free to go wild. 8-)

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] module, project, target = repository, module, project...

Posted by Brett Porter <br...@gmail.com>.

> Hmm. One problem I see is that the rest of the world (ie make users, python
> developers) don't follow that model at all, and would have to make a
> significant adjustment to start thinking about "groups" and what are the
> groupings in their software.

group = product, project = thing that I build, artifact = thing that
comes out of my build

I actually thought that was a superset. Do you have an example from
those that doesn't fit? I thought also that python used a namespace -
equivalent to group, and packaging - equivalent to artifact, but I
don't know it well. Also bear in mind that the group doesn't need to
map directly to something, as long as whatever picked is consistent
with the spirit of the scheme and whatever gump is interacting with.

> Secondly, "everything has a unique id". It doesn't necessarily matter if its
> autogenerated, but its vital to do semantic-web-like stuff.

I'm not sure what you are quoting here, but if you want to have a
single unique id for each thing, I'd suggest it should be on the
artifact, not the project.

> Does that make sense?

I'm a little confused, but I think so.

Looking at this from the perspective of a Maven user (moreso than as a
developer), what I want is to not have to specify gump <-> Maven
mappings. I imagine this would be the same for any project that has
some sort of identification scheme of its own, rather than just
selecting them as they are added to gump.

What that means is that gump needs to internally take Maven IDs and
convert them to gump IDs (so a gump descriptor for Ant would need to
describe its Maven repository IDs, rather than the other way around as
it is now).

The other alternative is for the IDs to match - either by gump using
the Maven ones, or by Maven changing its repository.

I'm thinking the first option is the most realistic, and guards
against rogue naming by being able to add mappings.

Am I on the right page here for your goals?

Cheers,
Brett

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] module, project, target = repository, module, project...

Posted by Leo Simons <ma...@leosimons.com>.

On 17-04-2005 02:56, "Brett Porter" <br...@gmail.com> wrote:
<snip/>
> http://wiki.apache.org/gump/MavenId
> 
> What do you think of using the group ID and artifact ID ideas in gump?
> 
> So:
> - a group is a collection of projects, and a project builds one or
> more artifacts.
> - group probably equates to a repository, currently
> - project is a build, but has no ID of its own, just a path relative
> to the repository
> - artifact equates to the jar id in gump now
> - all internal references are by groupID + artifactID

Hmm. One problem I see is that the rest of the world (ie make users, python
developers) don't follow that model at all, and would have to make a
significant adjustment to start thinking about "groups" and what are the
groupings in their software.

Secondly, "everything has a unique id". It doesn't necessarily matter if its
autogenerated, but its vital to do semantic-web-like stuff.

I think what we're looking for is the superset of all the different
organisational models that all software developers use for their software
builds. Gump has that now, but it leads to inconsistencies. The maven model
is a model (a good one), but not a superset of all possibilities.

Does that make sense?

Cheers,

Leo

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] module, project, target = repository, module, project...

Posted by Brett Porter <br...@gmail.com>.

Hi Leo,

On 4/17/05, Leo Simons <ma...@leosimons.com> wrote:
> The more and more I look at this, the more I'm disliking how all this is set
> up, esp. as its not very consistent across different projects, and we don't
> have very clear guidelines on how people should be doing this (other than
> "copy existing practices"). It's a little messy.
> 
> I'm tempted to do a radical remodelling of our metadata structure to remove
> this kind of ambiguity, even going as far as having conventions like
> project-name-is-file-name be gently enforced.

This was exactly the problem that was encountered in attempting to
match gump IDs to Maven IDs - firstly that there was inconsistent
naming, but also because they used a different scheme. I've detailed
this, and possible solutions in either Maven or Gump here:
http://wiki.apache.org/gump/MavenId

What do you think of using the group ID and artifact ID ideas in gump?

So:
- a group is a collection of projects, and a project builds one or
more artifacts.
- group probably equates to a repository, currently
- project is a build, but has no ID of its own, just a path relative
to the repository
- artifact equates to the jar id in gump now
- all internal references are by groupID + artifactID

Note also that groups are hierachical now, so while jakarta-commons
may be a group, so is jakarta-commons-jelly (as long as a group can
contain a group, this will work from gump perspective also).

WDYT?

Cheers,
Brett

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: RDF

Posted by Stefano Mazzocchi <st...@apache.org>.

Leo Simons wrote:
> On 17-04-2005 00:53, "Stefano Mazzocchi" <st...@apache.org> wrote:
> 
>>Example, if you have the Module object and the Project object, you have
>>to decide which way the link goes and the notion of "Module.projects"
>>means, this is the list of projects this module contains.
>>
>>Problem is that this implicit modeling forces you to say decide the
>>direction of the link, and, in case you want both, you have to model
>>this explicitly and at update, you need to know where to change.
>>
>>In RDF, you don't have to do all that.
> 
> 
> Exactly! If you want a bi-directional link you have to model it explicitly
> and it is always very evident when using it, ie
> 
>   project.module.repository.workspace.name
> 
> Just yells "You're handling a project and accessing something related to the
> workspace. Why is that????" right at ya.

Yep.

> One thing that got Gump2 into problem was that things were relatively
> tightly coupled to another. Having "manual" modelling means that its easy to
> spot that coupling (just delete all links from repository->workspace, run
> your project-related code, boom, it blows up).
> 
> As with databases, I (model designer) have to work real hard so the plugin
> programmer has an easier time. Interestingly...
> 
>>I find it somewhat ironic that you now code in a dynamically typed
>>language (and, AFAIK, with good feelings about it) and you advocate that
>>static typing of your data (object or SQL doesn't really matter) is
>>better for you.
> 
> 
> I hadn't realised that this clearly just yet. I've been conciously making a
> lot of things statically typed to keep it understandable. Now...
> 
> <snip/>
> 
>>  failed_builds = model.get("?x is_a Build where ?x status 'failed'")
> 
> 
> Is indeed quite understandable. At least I had no problem understanding that
> when I first saw it.

Glad to hear that. I find it quite understandable myself, but only when 
you remove all the complexity that is introduced by the fact that all 
those things need to be globally unique URIs. Luckily some APIs came to 
the rescue.

>>Sure, the argument that objects are better than dealing with JDBC
>>resultsets by hand stands, but making this a general rule could be turn
>>out to be a mistake.
> 
> Do you know of an open-source reasonably sized RDF-model-based application
> that follows the approach you're describing? I'd like to see how it turns
> out! I was looking at Haystack the other day but uhm, it suffers from all of
> those "research project" flaws.

eheh, well, we are building one as we speak, but can't tell you more :-)

Let me just say that we have been dealing with as many as 30 million 
statements and as long as your queries are reasonable (say you don't 
iterate over all of the nodes!), the performance is reasonable as well.

Haystack tried to do too much (they are modelling their entire system, 
including the UI, with RDF statements... which means that its pretty 
much painful to do anything).

> Same comment again....
> 
>>I find it somewhat ironic that you now code in a dynamically typed
>>language (and, AFAIK, with good feelings about it) and you advocate that
>>static typing of your data (object or SQL doesn't really matter) is
>>better for you.
> 
> 
> You know, I still have mixed feelings about a lot of that. I have read so
> much python code recently that is hard to understand because its really
> dynamic, often for no good reason. And I've also see a lot of python code
> look really bad because developers want to add security in there that can't
> truly be enforced (ie Zope). And a whole lot of python code that is horribly
> structured simply because you can do a lot of "glueing" so easily.
> 
> On a code level scale, working with python can be real fun once you get the
> hang of it, but every time I write something like
> 
>   for command in [command for command in commands \
>       if isinstance(command,Script)]:
>     handle_script(command)
> 
> (which is kinda "pythonic")
> I do wonder whether
> 
>   it = commands.iterator();
>   while(it.hasNext()) {
>     command = it.next();
>     if(command instanceof Script)
>       handleScript(command);
>   }
> 
> Doesn't make more sense if there's other developers that have to understand
> the code.

True enough.

All I can tell is that semi-structures deal with the entropy of things a 
lot better than forcing structure on top of them: "refactoring" data in 
a triple store could be as easy as writing a few other owl:sameAs 
statements between node types and running an inferencing engine on it 
(maybe for a few hours or a day... *while* the system is still running).

-- 
Stefano.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: RDF

Posted by Leo Simons <ma...@leosimons.com>.

On 17-04-2005 00:53, "Stefano Mazzocchi" <st...@apache.org> wrote:
> Example, if you have the Module object and the Project object, you have
> to decide which way the link goes and the notion of "Module.projects"
> means, this is the list of projects this module contains.
> 
> Problem is that this implicit modeling forces you to say decide the
> direction of the link, and, in case you want both, you have to model
> this explicitly and at update, you need to know where to change.
> 
> In RDF, you don't have to do all that.

Exactly! If you want a bi-directional link you have to model it explicitly
and it is always very evident when using it, ie

  project.module.repository.workspace.name

Just yells "You're handling a project and accessing something related to the
workspace. Why is that????" right at ya.

One thing that got Gump2 into problem was that things were relatively
tightly coupled to another. Having "manual" modelling means that its easy to
spot that coupling (just delete all links from repository->workspace, run
your project-related code, boom, it blows up).

As with databases, I (model designer) have to work real hard so the plugin
programmer has an easier time. Interestingly...
> I find it somewhat ironic that you now code in a dynamically typed
> language (and, AFAIK, with good feelings about it) and you advocate that
> static typing of your data (object or SQL doesn't really matter) is
> better for you.

I hadn't realised that this clearly just yet. I've been conciously making a
lot of things statically typed to keep it understandable. Now...

<snip/>
>   failed_builds = model.get("?x is_a Build where ?x status 'failed'")

Is indeed quite understandable. At least I had no problem understanding that
when I first saw it.

> Sure, the argument that objects are better than dealing with JDBC
> resultsets by hand stands, but making this a general rule could be turn
> out to be a mistake.

Do you know of an open-source reasonably sized RDF-model-based application
that follows the approach you're describing? I'd like to see how it turns
out! I was looking at Haystack the other day but uhm, it suffers from all of
those "research project" flaws.

Same comment again....
> I find it somewhat ironic that you now code in a dynamically typed
> language (and, AFAIK, with good feelings about it) and you advocate that
> static typing of your data (object or SQL doesn't really matter) is
> better for you.

You know, I still have mixed feelings about a lot of that. I have read so
much python code recently that is hard to understand because its really
dynamic, often for no good reason. And I've also see a lot of python code
look really bad because developers want to add security in there that can't
truly be enforced (ie Zope). And a whole lot of python code that is horribly
structured simply because you can do a lot of "glueing" so easily.

On a code level scale, working with python can be real fun once you get the
hang of it, but every time I write something like

  for command in [command for command in commands \
      if isinstance(command,Script)]:
    handle_script(command)

(which is kinda "pythonic")
I do wonder whether

  it = commands.iterator();
  while(it.hasNext()) {
    command = it.next();
    if(command instanceof Script)
      handleScript(command);
  }

Doesn't make more sense if there's other developers that have to understand
the code.

G'day!

LSD

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: RDF

Posted by Stefano Mazzocchi <st...@apache.org>.

Leo Simons wrote:

[snip]

> So, ehm, no, I don't actually think it'll be a tremendous win. It'll bring
> some huge benefits, but it'll incur a big cost as well. Simplicity loss.
> 
> Or maybe not. I'm not exactly an expert here. We do have one of those around
> I think. Hence: "Show me!"

The way you deal with statements is a little different than the way you 
deal with objects. Objects have explicit semantics, as much as 
statements, but their relationships are not typed.

Example, if you have the Module object and the Project object, you have 
to decide which way the link goes and the notion of "Module.projects" 
means, this is the list of projects this module contains.

Problem is that this implicit modeling forces you to say decide the 
direction of the link, and, in case you want both, you have to model 
this explicitly and at update, you need to know where to change.

In RDF, you don't have to do all that. If you have a bunch of statements

  ModuleA -(is_a)-> Module
  ProjectA -(is_a)-> Project
  ModuleA -(contains)-> ProjectA
  ProjectA -(has_name)-> "Cocoon"@en^string
  Build-20050415-343 -(is_a)-> Build
  Build-20050415-343 -(built)-> ProjectA
  Build-20050415-343 -(status)-> "failed"@en^string
  Build-20050415-343 -(depends)-> Build-20050415-234
  ...

and so on. It's basically a log of the things you come to know about 
stuff and this becomes your knowledge base. No structure, you don't need 
it, you just need to be careful about how you model things and this 
becomes natural and grows with you. No need to define the objects nor 
the schema before you know how complex your data is.

Very incremental, very XP, fits nicely both in the lazyness mode and in 
the separation between data production and data consumption that we want 
to enforce in Gump3.

Now, what about the data consumption side?

Well, the data is in the triple store, so you need to query it. There 
are many different ways to do this, but two main categories:

  1) via an API
  2) via a query language

depending on the triple store you use, you get a different API and/or 
query language. The API feels more natural, but can be less optimized by 
the triple store.

For example (pseudocode)

Get all modules:
  modules = getSubjects("is_a","Module");

Get all builds that failed:
  builds = model.getSubjects("is_a","Build");
  foreach (build in builds):
	status = model.getObjects(build,"status")
	if (status == "failed"):
		failed_builds.add(build)

you get the idea.

But you could also so something like

  failed_builds = model.get("?x is_a Build where ?x status 'failed'")

which is not that hard to get.

Objects are just syntax sugar around SQL statements: you have to model 
your data first, then add it in. In RDF is the other way around, you 
pile up your data and the database follows you.

Sure, the argument that objects are better than dealing with JDBC 
resultsets by hand stands, but making this a general rule could be turn 
out to be a mistake.

The vision of RDF is data first, metadata later. The vision of 
relational databases is metadata first, data later.

And the funny thing is that there is nothing in the relational model 
that suggests you that (in fact, RDF is nothing but an explicit 
relational model with globally unique identifiers) but the idea of 
building a database by creating a schema was driven by the vision that 
statical typing is good for you even if it locks you in (certanly is 
good for the query indexers, and performance is clearly not the best 
feature of a triple store nowadays)

I find it somewhat ironic that you now code in a dynamically typed 
language (and, AFAIK, with good feelings about it) and you advocate that 
static typing of your data (object or SQL doesn't really matter) is 
better for you.

I think RDF offers a better model, especially for something integrating 
data and metadata from different independent domains like Gump.

But of course, I'm biased.

-- 
Stefano.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: RDF

Posted by Leo Simons <ma...@leosimons.com>.

On 16-04-2005 21:59, "Stefano Mazzocchi" <st...@apache.org> wrote:
> Leo Simons wrote:
>> On 16-04-2005 18:30, "Stefano Mazzocchi" <st...@apache.org> wrote:
>> 
>>> The more I think about it, the more I think that having our data in RDF
>>> would be a tremendous win, also in terms of programming.
>>  
>> Show me!
> 
> Nice try ;-)

Yeah I thought so :-D

I just spend some time trying to envision what gump would like codewise with
a RDF triplestore at its core. It would be a lot more like an application
that uses a database for all its storage, except that the database stores
triples instead of rows. You'd then have lots of RDQL (or similar) queries
sprankled throughout the codebase.

That wouldn't look very nice or easy to understand at all. Adam and now I in
his footsteps have worked pretty hard to make the distance between the
conceptual model (in the form of clean python objects) and its XML
representation huge, simply because that makes the majority of the code a
lot easier to understand.

Using RDF at the core instead of an object model would mean you would need
to understand RDF and how we map our conceptual model onto RDF in order to
be productive in development. That would not be nice. We have enough
concepts in there already.

Unless, of course, you could build a "magic" autogenerated model where
property setting and getting actually triggers interaction with the RDF
datastore. Not magical object-relational but magical object-triple mapping.
And, once you go there, it turns out that it doesn't matter that much right
now whether we move to RDF or not; we can just develop our plugins against
the "manual model" and do something "magic" later.

You may know I'm a little shy about "magic" (where's my little essay on that
again :-D); experience showed that a very smart sax-based xml-querying
automodelling is very possible (sam wrote one, remember) and very hard to
understand.

So, ehm, no, I don't actually think it'll be a tremendous win. It'll bring
some huge benefits, but it'll incur a big cost as well. Simplicity loss.

Or maybe not. I'm not exactly an expert here. We do have one of those around
I think. Hence: "Show me!"

:-D

Cheers,

LSD

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: RDF

Posted by Stefano Mazzocchi <st...@apache.org>.

Leo Simons wrote:
> On 16-04-2005 18:30, "Stefano Mazzocchi" <st...@apache.org> wrote:
> 
>>The more I think about it, the more I think that having our data in RDF
>>would be a tremendous win, also in terms of programming.
>  
> Show me!

Nice try ;-)

-- 
Stefano.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

RDF (was: [RT] module, project, target = repository, module, project...)

Posted by Leo Simons <ma...@leosimons.com>.

On 16-04-2005 18:30, "Stefano Mazzocchi" <st...@apache.org> wrote:
> The more I think about it, the more I think that having our data in RDF
> would be a tremendous win, also in terms of programming.

Show me!



---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: [RT] module, project, target = repository, module, project...

Posted by Stefano Mazzocchi <st...@apache.org>.

Leo Simons wrote:

> Oh, ehm, I was even briefly tempted to turn our model into RDF but there
> ain't that many good tools for RDF editing :-D

The more I think about it, the more I think that having our data in RDF 
would be a tremendous win, also in terms of programming. There are 
python triple stores that are just fine and would allow you to load all 
the data in RDF, then *query* it for the stuff you need.

No object model work, just look for the statements you want (example: 
give me all the projects in this repository, or give me all the project 
that depend on this other project).

How do we get there?

Well, I would suggest to RDFize our XML data by writing a few XSLT 
stylesheets and run them on them.

So, instead of having people to write (and learn!) RDF, you can just 
have them write their XML data as they did in the past.

That gets you started without having to convince people ;-)

-- 
Stefano.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org