You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@gump.apache.org by Danny Ayers <da...@gmail.com> on 2004/08/28 20:26:21 UTC

re. RDF 102 s.v.p...

Big flashing lights on my radar - I'm a fan of RDF, and Gump is an
exceedingly neat idea. So  one or one comments on top of Stefano's
comments (sorry about the quoting):

[[
> basically this is like the RSS and Atom feeds that Gump put's out, 
> except they also have data at the module level (for all projects within 
> module). Basically, I figured that folks might sometimes want specific 
> information, and sometimes want it all (to feed into some store).

Kewl.
]]

Yep.

[[
> I started out pretty simply, Gump defines some classes (Project, 
> Repository) and some properties (e.g. name) and then makes some 
> statements (Project:X depends upon Project:Y, Project:X resides within 
> Repo:Z). Nothing complicated, but a start.

nice.
]]

The vocab is nice. A while back I did some work on a project vocab [1]
but spent far too much time on the terms and not enough on doing stuff
with it - got bogged down, bloat, overengineered. So I reckon just
starting with a few terms and actually *using* them is the best route
forward.

[[
> Even this small foray allowed me to come up with some questions, and 
> want more input:

<semantic-web-hat mode="on">

Copying Dirk since he's a semweb fan as much as I am.

> Some areas to look into:
> 
> 1) Design Decisions/Questions:
> 
> 1.1) Ought we define the URI for a project (or other entity) to point to 
> the standalone RDF for that entity? I'm sure there is no need to, but it 
> might allow tools to discover upon demand.

This would be a URL and my suggestion would be something like

http://gump.apache.org/data/path/project/20040827
]]

It could well be helpful, but I'd suggest being careful about what
statements are made involving the URI - i.e. does that resource
actually identify the project? If so a direct rdf:about to it is ok,
but otherwise a technique that's getting popular is to have an
rdfs:seeAlso to it. There's not the same level of commitment, but
automatic tools can still pick up the data. (Other statements such as
the type of the resource can also be made, but aren't necessary from
day one).


[[
> 1.2) What if there are two sources of RDF triples about an entity? Say 
> we have facts in a standalone document, and in a shared one (or in a 
> triple store)? Are triples merged? 

Yes, that's be beauty of the RDF model: you can have statements coming 
from different sources, and they get aggregated.

> What if they clash with each other? 
> [e.g. one source says X dependsOn Y, but another says Y dependsOn X or 
> something contradictory?]
[snip]

]]

Just to expand on Stefano's comments a little - the basic RDF model
would in effect see both/all the statements as being true. ANDed if
you like. There are tricks (particularly at the OWL level) which can
be used to spot inconsistencies using general-purpose inferencing, but
then again there's nothing to stop more application-specific reasoning
being coded on top of the RDF data model.


[[
> 1.3) How do we define a URI to represent a long lived (yet varying) 
> entity? 

eheh, great question ;-)
]]

Easy! My home page is http://dannyayers.com. The representations of it
(the HTML & RSS) vary a lot, but conceptually it's the same entity.

[[
> Ought we (say) include the version of Cocoon in the URI, so we 
> know facts about that release/state, or do we just say Cocoon? 

I'm a big fan of numerical URIs for long-term persisting things. The 
less implicit semantics in the URI, the higher the chance of surviving 
changes without requiring the URI to change.
]]

If I understand Stefano correctly, I agree - if it's worth saying,
make it explicit in some other way as additional statements, the URI
is opaque to (most) machine processing.

[[
> If Cocoon 
> dependsOn Avalon today, but not tomorrow, what happens to the Cocoon 
> dependsOn Avalon triple? Is it wrong? Expired?

This is where it starts to get very tricky.
]]

Yup ;-)

[provenance through reification snipped]

[[
Well, I would just create a new model everytime, 
...
]]

Yep, that's the easiest, only retain the current version of the
model/graph/document at the location which is going to have its data
processed.

Another alternative would be to wrap up the dependencies in a little
cluster with a timestamp, something like:

 Snapshot  containsDependency Avalon
 Snapshot  date 2004-08-28

I suspect this may be complicating matter unnecessarily though -
letting the triples 'expire' through their absence in the latest
version is a lot easier. There's some doc on this kind of thing at
[2].

[[
> 2) Ongoing investigations:
> 
> 2.1) I think we wish to define a Gump Ontology at 
> 'http://gump.apache.org/schemas/main/1.0/'? I am still a little confused 
> by OWL and/or RDFS, and I know there is no immediate  need to hurry. I 
> guess I feel without an Ontology we are speaking a language foreign to 
> everybody, but that is ok as we learn to speak. That said, how do we go 
> about refining this? Just set it out there and tinker?

I would not worry about this for now, just like you don't need an 
XMLSchema to write some well-formed XML.
]]

Yep, tinker.

[[
> 2.2) I think we wish to map the Gump Ontology to DOAP and others (even 
> parts of FOAF). How would we do that

with some OWL ontologies.
]]

and/or RDF schema.
For example,  in your schema/ontology you could say:

gump:Project rdfs:subClassOf doap:Project

or

doap:Project rdfs:subClassOf  gump:Project

or *both*. By asserting both you'd be saying that every individual in
the set of doap:Projects is also member of the the set of
gump:Projects, and vice versa. You can say the same thing using :

doap:Project owl:equivalentClass gump:Project

There is another candidate relationship, owl:sameAs, but this says
that the two individuals (in this case classes) are the same. This can
be problematic both conceptually (are you sure the classes are
*exactly* the same?) and sometimes in practice (the DL breed of
reasoners tend to choke). It's not written in stone anywhere, but
folks who generally know what they're doing (like Dan Brickley) tend
to avoid owl:sameAs for this purpose.

You may be able to reuse(/hijack!) some of the DOAP authoring tools.
There no reason Gump can't use the same syntax style (with equivalent
meaning to your examples):

<Project rdf:about="http://apache.org/gump/project/xml-xerces/"
   xmlns="http://gump.apache.org/schemas/main/1.0/"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
>
    <dependsOn rdf:resource="http://apache.org/gump/project/xjavac/"/>
    <dependsOn rdf:resource="http://apache.org/gump/project/bootstrap-ant/"/>
    <residesWithin rdf:resource="http://apache.org/gump/repository/xml/"/>
    <name>xml-xerces</name>
</Project>


[[
> and how would we test/exercise it?

you don't, you just publish your data in the best way possible and see 
what happens ;-)
]]

Hmm, I dunno, this is one to think about. Assuming you had all the
(combined) project data in a store, what kind of questions could you
ask? What if you stuck the RDF into a reasoner and asserted project X
(defined only in DOAP) depends on project Y (defined in Gump),
presumably then X would inherit all the dependencies of Y. The
reasoner may be able to spit out X's dependencies.
Scruffy types tend to play around with this stuff in cwm [3], those
that comb their hair often opt for Protege [4].
    
[[
> 2.3) Ought we consider (over time) an ASF-wide Ontology, perhaps 
> defining TLPs/other communities, and having Gump state triples for this 
> project memberOf this community. [We tend to figure out communities from 
> the repository, e.g. cvs.sf.net or ...]

Adam, keep focus: one thing at a time ;-)
]]

Yes and yes ;-)

[[
> 3) Usages:
> 
> 3.1) I was hoping to work on PSP to do queries into the RDBMS. This is 
> primarily for historical information, but I was thinking about using it 
> for dependency information also.  The more I think abotu the RDF 
> information, and triple queries, it seems an RDF store might be a better 
> place to hold/maintain and query. This information seems RDF-ish, not 
> RDBMS-ish.

Agreed. I would use a triple store with an RDQL query engine (Redland 
has such a thing and has Python hooks)
]]

Ah, forgot about Redland (again). Yes again.

[[
> 3.2) What other 'users' of this descriptor information seem viable? 
> Ought tools (e.g. Depot) be wishing to figure things out from it? Others?
]]

I wouldn't worry too much about that - generate good data and
applications will emerge. The first batch will probably just be pretty
node & arc visualizations, but they can be useful too...

[[
Once the RDF infrastructure is in place, one of my goals is to add 
"legal" metadata to the project and create an inferencing layer that 
indicates whether or not a project is *legal* depending on the 
combination of the licenses.
]]

Kewl.

Cheers,
Danny.

[1] http://purl.org/stuff/project/
[2] http://www.w3.org/TR/swbp-n-aryRelations/
[3] http://www.w3.org/2000/10/swap/doc/cwm.html
[4] http://protege.stanford.edu/

-- 

http://dannyayers.com

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: re. RDF 102 s.v.p...

Posted by Danny Ayers <da...@gmail.com>.

On Wed, 01 Sep 2004 16:51:46 -0400, Stefano Mazzocchi
<st...@apache.org> wrote:

> Anyway, rdf:about indicates the subject while rdfs:seeAlso indicates an
> implicit relationship.
> 
> I personally strongly dislike rdf:seeAlso for that: it doesn't state
> what kind of relationship you have with that URI, it's vague and
> semantically useless.

True, but it's pragmatically useful. An explicit property:

X y:property http://somewhere.org

gives you proper semantics. But it doesn't (by itself) tell you
anything about what if anything you get if do a HTTP GET on the.
Strictly speaking neither does seeAlso, but people are tending to put
retrievable RDF documents at the end, so it makes a useful hint. Note
that there's nothing to stop there being *more* information:

X rdfs:seeAlso http://somewhere.org
 X y:property http://somewhere.org
http://somewhere.org rdf:type foaf:Document

> In general, I don't like a lot of RDFSchema and I think that OWL Light
> (or even a subset of that, OWL Tiny as some people call it) is a lot
> mroe coherent than RDFSchema, but that's just me.

Hmm - you are using a DL reasoner?
 
> >>>1.3) How do we define a URI to represent a long lived (yet varying)
> >>>entity?
> >>
> >>eheh, great question ;-)
> >>]]
> >>
> >>Easy! My home page is http://dannyayers.com. The representations of it
> >>(the HTML & RSS) vary a lot, but conceptually it's the same entity.
> 
> Easy?!? c'mon. Easy to implement so that you show something? sure. Easy
> to implement so that the semantic web can really happen? another story.
> Ask the TAG ;-)

TimBL seems reasonably happy ;-)
 
> > But how can I tell you things about how it looked yesterday or last week?
> 
> exactly. As a hack, if you have a date-based URI, you can "infer" that
> if you have
> 
>   http://blah/newsfeed/2004/03/23
> 
> then you can ask for
> 
>   http://blah/newsfeed/2003/03/23
> 
> and get the news of the same day last year. But there is no guarantee
> that this is so.

Quite. Yet if the information is made explicit in RDF statements, you
can be (more) sure.

> Also,
> 
>   http://blah/newsfeed
> 
> might return you the "last" feed, but then you have no idea on how to
> ask for a previous feed.
> 
> The RDF data access WG is supposed to solve this issue, tough, but I
> suspect that a clear result won't be found, it will just emerge out of
> de-facto useful practices.

On that you're probably right. But if anyone's going to pull rabbits
out of hats it's those guys.

> I think the ASF *should* start to centralize these things and associate
> persistent URIs to projects. I can push this at the board@ level if
> required.

Ooh, the word 'centralize' is a little worrying. What if the project
moves somewhere else? Are projects really that persistent? (Maybe, I
don't know).
 
Cheers,
Danny.

-- 

http://dannyayers.com

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org

Re: re. RDF 102 s.v.p...

Posted by Stefano Mazzocchi <st...@apache.org>.

Adam R. B. Jack wrote:

>>The vocab is nice. A while back I did some work on a project vocab [1]
>>but spent far too much time on the terms and not enough on doing stuff
>>with it - got bogged down, bloat, overengineered. So I reckon just
>>starting with a few terms and actually *using* them is the best route
>>forward.
> 
> 
> I hear that. Still, amusingly although I'd happily let Gump pump out Atom or
> RSS (all bet's off ;-) but I worry about pumping out RDF without a lot more
> thought/design. I fear that a triple could end up permenantly in some corner
> of some remote triple store and dork over some logic eons from now. ;-)

well, this could be said for any information you spit out, including 
this email, so I wouldn't worry about it that much ;-)

>>It could well be helpful, but I'd suggest being careful about what
>>statements are made involving the URI - i.e. does that resource
>>actually identify the project? If so a direct rdf:about to it is ok,
>>but otherwise a technique that's getting popular is to have an
>>rdfs:seeAlso to it. There's not the same level of commitment, but
>>automatic tools can still pick up the data. (Other statements such as
>>the type of the resource can also be made, but aren't necessary from
>>day one).
> 
> Interesting. Could we have a dated URI that has a seeAlso to the non-dated
> one? Hmm, if we had those two is it really not an 'about'?

careful here: the fact that a system wants to try to derefernce the URI 
that is included in rdf:about is not a requirement. Although not een 
rdf:seeAlso is required to be dereferenced, there is a lot higher chance 
that seeAlso is dereferenced.

Anyway, rdf:about indicates the subject while rdfs:seeAlso indicates an 
implicit relationship.

I personally strongly dislike rdf:seeAlso for that: it doesn't state 
what kind of relationship you have with that URI, it's vague and 
semantically useless.

In general, I don't like a lot of RDFSchema and I think that OWL Light 
(or even a subset of that, OWL Tiny as some people call it) is a lot 
mroe coherent than RDFSchema, but that's just me.

>>>1.3) How do we define a URI to represent a long lived (yet varying)
>>>entity?
>>
>>eheh, great question ;-)
>>]]
>>
>>Easy! My home page is http://dannyayers.com. The representations of it
>>(the HTML & RSS) vary a lot, but conceptually it's the same entity.

Easy?!? c'mon. Easy to implement so that you show something? sure. Easy 
to implement so that the semantic web can really happen? another story. 
Ask the TAG ;-)

> But how can I tell you things about how it looked yesterday or last week? 

exactly. As a hack, if you have a date-based URI, you can "infer" that 
if you have

  http://blah/newsfeed/2004/03/23

then you can ask for

  http://blah/newsfeed/2003/03/23

and get the news of the same day last year. But there is no guarantee 
that this is so.

Also,

  http://blah/newsfeed

might return you the "last" feed, but then you have no idea on how to 
ask for a previous feed.

The RDF data access WG is supposed to solve this issue, tough, but I 
suspect that a clear result won't be found, it will just emerge out of 
de-facto useful practices.

> don't think I can w/o adding that information separately (storing version in
> an SCM, or whatever). I feel I want Gump to do that work for others.

Maybe not gump itself, but Bubba or something: a web service exposing 
the gump metadata in a more useful way.

>>I'm a big fan of numerical URIs for long-term persisting things. The
>>less implicit semantics in the URI, the higher the chance of surviving
>>changes without requiring the URI to change.
>>]]
>>
>>If I understand Stefano correctly, I agree - if it's worth saying,
>>make it explicit in some other way as additional statements, the URI
>>is opaque to (most) machine processing.
> 
> 
> Seems counter to the XML goal of 'human readable', but I can understand. 

RDF is (almost by definition) definately not targetted to humans ;-)

> So,
> we'd generate an ID for a project, and have a triple assigning it's name
> (which could change). Might be good for tracking projects as they mature up
> to TLP and such. That said, could we have two types? ID provided and
> free-form so we don't need to centralize this?

I think the ASF *should* start to centralize these things and associate 
persistent URIs to projects. I can push this at the board@ level if 
required.

>>doap:Project owl:equivalentClass gump:Project
>>
> 
> 
> Now this makes sense. They are both the same class of thing, we just have a
> few different views on properties (perhaps) for a while.

Exactly. Just work on your stuff, then we draw equivalences so that 
external inferencing engines don't note the differences.

-- 
Stefano.

Re: re. RDF 102 s.v.p...

Posted by "Adam R. B. Jack" <aj...@apache.org>.

> The vocab is nice. A while back I did some work on a project vocab [1]
> but spent far too much time on the terms and not enough on doing stuff
> with it - got bogged down, bloat, overengineered. So I reckon just
> starting with a few terms and actually *using* them is the best route
> forward.

I hear that. Still, amusingly although I'd happily let Gump pump out Atom or
RSS (all bet's off ;-) but I worry about pumping out RDF without a lot more
thought/design. I fear that a triple could end up permenantly in some corner
of some remote triple store and dork over some logic eons from now. ;-)

> It could well be helpful, but I'd suggest being careful about what
> statements are made involving the URI - i.e. does that resource
> actually identify the project? If so a direct rdf:about to it is ok,
> but otherwise a technique that's getting popular is to have an
> rdfs:seeAlso to it. There's not the same level of commitment, but
> automatic tools can still pick up the data. (Other statements such as
> the type of the resource can also be made, but aren't necessary from
> day one).

Interesting. Could we have a dated URI that has a seeAlso to the non-dated
one? Hmm, if we had those two is it really not an 'about'?

> [[
> > 1.3) How do we define a URI to represent a long lived (yet varying)
> > entity?
>
> eheh, great question ;-)
> ]]
>
> Easy! My home page is http://dannyayers.com. The representations of it
> (the HTML & RSS) vary a lot, but conceptually it's the same entity.

But how can I tell you things about how it looked yesterday or last week? I
don't think I can w/o adding that information separately (storing version in
an SCM, or whatever). I feel I want Gump to do that work for others.

> I'm a big fan of numerical URIs for long-term persisting things. The
> less implicit semantics in the URI, the higher the chance of surviving
> changes without requiring the URI to change.
> ]]
>
> If I understand Stefano correctly, I agree - if it's worth saying,
> make it explicit in some other way as additional statements, the URI
> is opaque to (most) machine processing.

Seems counter to the XML goal of 'human readable', but I can understand. So,
we'd generate an ID for a project, and have a triple assigning it's name
(which could change). Might be good for tracking projects as they mature up
to TLP and such. That said, could we have two types? ID provided and
free-form so we don't need to centralize this?

>
> doap:Project owl:equivalentClass gump:Project
>

Now this makes sense. They are both the same class of thing, we just have a
few different views on properties (perhaps) for a while.

> There is another candidate relationship, owl:sameAs, but this says
> that the two individuals (in this case classes) are the same. This can
> be problematic both conceptually (are you sure the classes are
> *exactly* the same?) and sometimes in practice (the DL breed of
> reasoners tend to choke). It's not written in stone anywhere, but
> folks who generally know what they're doing (like Dan Brickley) tend
> to avoid owl:sameAs for this purpose.

Thanks for the heads up.

> You may be able to reuse(/hijack!) some of the DOAP authoring tools.
> There no reason Gump can't use the same syntax style (with equivalent
> meaning to your examples):
>
> <Project rdf:about="http://apache.org/gump/project/xml-xerces/"
>    xmlns="http://gump.apache.org/schemas/main/1.0/"
>    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> >
>     <dependsOn rdf:resource="http://apache.org/gump/project/xjavac/"/>
>     <dependsOn
rdf:resource="http://apache.org/gump/project/bootstrap-ant/"/>
>     <residesWithin rdf:resource="http://apache.org/gump/repository/xml/"/>
>     <name>xml-xerces</name>
> </Project>
>
>
> [[
> > and how would we test/exercise it?
>
> you don't, you just publish your data in the best way possible and see
> what happens ;-)
> ]]
>
> Hmm, I dunno, this is one to think about. Assuming you had all the
> (combined) project data in a store, what kind of questions could you
> ask? What if you stuck the RDF into a reasoner and asserted project X
> (defined only in DOAP) depends on project Y (defined in Gump),
> presumably then X would inherit all the dependencies of Y. The
> reasoner may be able to spit out X's dependencies.
> Scruffy types tend to play around with this stuff in cwm [3], those
> that comb their hair often opt for Protege [4].
>

I like this approach, determine the questions. Stefano's license idea is on
the now, mine w/ version compatibility is on historical.

Hmm, dated is making my head hurt. I suspect it is time to drop that for
now, and jsut focus on 'todays view'...

> [1] http://purl.org/stuff/project/
> [2] http://www.w3.org/TR/swbp-n-aryRelations/
> [3] http://www.w3.org/2000/10/swap/doc/cwm.html
> [4] http://protege.stanford.edu/

Thanks for these.

regards

Adam


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@gump.apache.org
For additional commands, e-mail: general-help@gump.apache.org