You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Martynas Jusevicius <ma...@gmail.com> on 2011/05/02 10:51:08 UTC

Data partitioning dilemma (named graphs)

Hey list,

I want to improve provenance of RDF data in my app, and I'm mostly
looking at named graphs since reification seems not be used that much.

One point of view is logical divisions:
- read-only core ontologies
- user ontologies
- user instance data
I could make a named graph for each of them.

The other is that I'd like to have metadata about every added/updated
triple so the app could say "User X updated resource Y with value of Z
on date W". In this case basically every triple should have its own
unique URI - i.e. be a named graph with a single statement?

It seems that I could implement either the first case or the second
with named graphs, but not both, which I would prefer.
How would you go about it - has anyone worked on use cases like this?
Should I still consider reification - and maybe use it together with
named graphs?

Thanks,

Martynas
semantic-web.dk

Re: Data partitioning dilemma (named graphs)

Posted by Martynas Jusevicius <ma...@gmail.com>.

I mean it makes sense with named graph per ontology because ontologies
already come with namespace URIs, which seems like natural candidates
for graph names.
They can be ignored of course, but I think I've seen some exotic
exception when TDB was loading owl.owl with namespace URI defined in
xml:base into a named graph whose URI was different.

On Wed, May 4, 2011 at 9:28 AM, Dave Reynolds <da...@gmail.com> wrote:
> Hi Martynas,
>
> On Tue, 2011-05-03 at 22:11 +0200, Martynas Jusevicius wrote:
>> Thanks Dave.
>>
>> Right now I have a single named graph for all ontologies, but I guess
>> a graph per ontology makes more sense.
>
> Didn't mean to imply that was a requirement, depends on what you want to
> do.
>
>> I need to iterate through all ontology classes however, so I still
>> need a unified ontology model - how do I achieve that? I know of
>> ModelFactory.createUnion(), but it only works on model pairs.
>
> If that's a requirement then sticking to one graph for the combined
> ontologies is just fine.
>
> If you do want separate graphs but also want a union around then you can
> create multi-way dynamic unions using OntModel.addSubModel.
>
>> Speaking of your provenance work - how did you attach the UUID URI to
>> the triple in the data graph without using reification?
>
> In my case I had a provenance API to hide the details (I had both a
> multiple graph and a reified-by-hash implementation, different
> tradeoffs). For the UUID version I created a lexical form for the S, P,
> O, did an MD5 digest of those and then wrapped that up as a urn:uuid
> (i.e. a type 3 UUID). Then used that urn:uuid resource as the subject of
> the provenance statements. That worked because (a) there were no bNodes
> other than ones with stable internal anonIDs and (b) I only needed to go
> from a statement to its provenance. If you need to retrieve the
> statements themselves starting from provenance information then use the
> reification vocabulary or named graphs.
>
> Cheers,
> Dave
>
>> Martynas
>>
>> On Tue, May 3, 2011 at 6:29 PM, Dave Reynolds <da...@gmail.com> wrote:
>> > Hi Martynas,
>> >
>> > On Mon, 2011-05-02 at 10:51 +0200, Martynas Jusevicius wrote:
>> >> Hey list,
>> >>
>> >> I want to improve provenance of RDF data in my app, and I'm mostly
>> >> looking at named graphs since reification seems not be used that much.
>> >>
>> >> One point of view is logical divisions:
>> >> - read-only core ontologies
>> >> - user ontologies
>> >> - user instance data
>> >> I could make a named graph for each of them.
>> >>
>> >> The other is that I'd like to have metadata about every added/updated
>> >> triple so the app could say "User X updated resource Y with value of Z
>> >> on date W". In this case basically every triple should have its own
>> >> unique URI - i.e. be a named graph with a single statement?
>> >>
>> >> It seems that I could implement either the first case or the second
>> >> with named graphs, but not both, which I would prefer.
>> >> How would you go about it - has anyone worked on use cases like this?
>> >> Should I still consider reification - and maybe use it together with
>> >> named graphs?
>> >
>> > I guess it depends on how you want to manage the data,  whether you need
>> > to limit queries to particular sub-categories of data and just how much
>> > data you are talking about.
>> >
>> > In principle you could have a separate named graph both for each
>> > ontology and for each atomic addition of user triples plus a separate
>> > metadata graph. If atomic additions are made one triple at a time that
>> > would be a lot of named graphs but it is possible.
>> >
>> > If your updates include retractions than that gets messier in that you
>> > have to remove the old graph as well as add to the new one, still
>> > possible I guess.
>> >
>> > FWIW the last time I did serious work with triple level provenance
>> > (which was before named graphs were so much in vogue) I worked it with
>> > just two graphs - one for the asserted data and one for the metadata.
>> > The metadata graph could have used the reification vocabulary but I
>> > found it easier to generate a hash to identify the triple in the data
>> > graph and then use the hash (as a UUID URI) as the subject of provenance
>> > triples in the metadata graph. That's isomorphic to using reification
>> > but is more compact and easier to query.
>> >
>> > Dave
>> >
>> >
>> >
>
>
>
>

Re: Data partitioning dilemma (named graphs)

Posted by Dave Reynolds <da...@gmail.com>.

Hi Martynas,

On Tue, 2011-05-03 at 22:11 +0200, Martynas Jusevicius wrote: 
> Thanks Dave.
> 
> Right now I have a single named graph for all ontologies, but I guess
> a graph per ontology makes more sense.

Didn't mean to imply that was a requirement, depends on what you want to
do.

> I need to iterate through all ontology classes however, so I still
> need a unified ontology model - how do I achieve that? I know of
> ModelFactory.createUnion(), but it only works on model pairs.

If that's a requirement then sticking to one graph for the combined
ontologies is just fine.

If you do want separate graphs but also want a union around then you can
create multi-way dynamic unions using OntModel.addSubModel.

> Speaking of your provenance work - how did you attach the UUID URI to
> the triple in the data graph without using reification?

In my case I had a provenance API to hide the details (I had both a
multiple graph and a reified-by-hash implementation, different
tradeoffs). For the UUID version I created a lexical form for the S, P,
O, did an MD5 digest of those and then wrapped that up as a urn:uuid
(i.e. a type 3 UUID). Then used that urn:uuid resource as the subject of
the provenance statements. That worked because (a) there were no bNodes
other than ones with stable internal anonIDs and (b) I only needed to go
from a statement to its provenance. If you need to retrieve the
statements themselves starting from provenance information then use the
reification vocabulary or named graphs.

Cheers,
Dave

> Martynas
> 
> On Tue, May 3, 2011 at 6:29 PM, Dave Reynolds <da...@gmail.com> wrote:
> > Hi Martynas,
> >
> > On Mon, 2011-05-02 at 10:51 +0200, Martynas Jusevicius wrote:
> >> Hey list,
> >>
> >> I want to improve provenance of RDF data in my app, and I'm mostly
> >> looking at named graphs since reification seems not be used that much.
> >>
> >> One point of view is logical divisions:
> >> - read-only core ontologies
> >> - user ontologies
> >> - user instance data
> >> I could make a named graph for each of them.
> >>
> >> The other is that I'd like to have metadata about every added/updated
> >> triple so the app could say "User X updated resource Y with value of Z
> >> on date W". In this case basically every triple should have its own
> >> unique URI - i.e. be a named graph with a single statement?
> >>
> >> It seems that I could implement either the first case or the second
> >> with named graphs, but not both, which I would prefer.
> >> How would you go about it - has anyone worked on use cases like this?
> >> Should I still consider reification - and maybe use it together with
> >> named graphs?
> >
> > I guess it depends on how you want to manage the data,  whether you need
> > to limit queries to particular sub-categories of data and just how much
> > data you are talking about.
> >
> > In principle you could have a separate named graph both for each
> > ontology and for each atomic addition of user triples plus a separate
> > metadata graph. If atomic additions are made one triple at a time that
> > would be a lot of named graphs but it is possible.
> >
> > If your updates include retractions than that gets messier in that you
> > have to remove the old graph as well as add to the new one, still
> > possible I guess.
> >
> > FWIW the last time I did serious work with triple level provenance
> > (which was before named graphs were so much in vogue) I worked it with
> > just two graphs - one for the asserted data and one for the metadata.
> > The metadata graph could have used the reification vocabulary but I
> > found it easier to generate a hash to identify the triple in the data
> > graph and then use the hash (as a UUID URI) as the subject of provenance
> > triples in the metadata graph. That's isomorphic to using reification
> > but is more compact and easier to query.
> >
> > Dave
> >
> >
> >

Re: Data partitioning dilemma (named graphs)

Posted by Martynas Jusevicius <ma...@gmail.com>.

Thanks Dave.

Right now I have a single named graph for all ontologies, but I guess
a graph per ontology makes more sense.
I need to iterate through all ontology classes however, so I still
need a unified ontology model - how do I achieve that? I know of
ModelFactory.createUnion(), but it only works on model pairs.

Speaking of your provenance work - how did you attach the UUID URI to
the triple in the data graph without using reification?

Martynas

On Tue, May 3, 2011 at 6:29 PM, Dave Reynolds <da...@gmail.com> wrote:
> Hi Martynas,
>
> On Mon, 2011-05-02 at 10:51 +0200, Martynas Jusevicius wrote:
>> Hey list,
>>
>> I want to improve provenance of RDF data in my app, and I'm mostly
>> looking at named graphs since reification seems not be used that much.
>>
>> One point of view is logical divisions:
>> - read-only core ontologies
>> - user ontologies
>> - user instance data
>> I could make a named graph for each of them.
>>
>> The other is that I'd like to have metadata about every added/updated
>> triple so the app could say "User X updated resource Y with value of Z
>> on date W". In this case basically every triple should have its own
>> unique URI - i.e. be a named graph with a single statement?
>>
>> It seems that I could implement either the first case or the second
>> with named graphs, but not both, which I would prefer.
>> How would you go about it - has anyone worked on use cases like this?
>> Should I still consider reification - and maybe use it together with
>> named graphs?
>
> I guess it depends on how you want to manage the data,  whether you need
> to limit queries to particular sub-categories of data and just how much
> data you are talking about.
>
> In principle you could have a separate named graph both for each
> ontology and for each atomic addition of user triples plus a separate
> metadata graph. If atomic additions are made one triple at a time that
> would be a lot of named graphs but it is possible.
>
> If your updates include retractions than that gets messier in that you
> have to remove the old graph as well as add to the new one, still
> possible I guess.
>
> FWIW the last time I did serious work with triple level provenance
> (which was before named graphs were so much in vogue) I worked it with
> just two graphs - one for the asserted data and one for the metadata.
> The metadata graph could have used the reification vocabulary but I
> found it easier to generate a hash to identify the triple in the data
> graph and then use the hash (as a UUID URI) as the subject of provenance
> triples in the metadata graph. That's isomorphic to using reification
> but is more compact and easier to query.
>
> Dave
>
>
>

Re: Data partitioning dilemma (named graphs)

Posted by Dave Reynolds <da...@gmail.com>.

Hi Martynas,

On Mon, 2011-05-02 at 10:51 +0200, Martynas Jusevicius wrote: 
> Hey list,
> 
> I want to improve provenance of RDF data in my app, and I'm mostly
> looking at named graphs since reification seems not be used that much.
> 
> One point of view is logical divisions:
> - read-only core ontologies
> - user ontologies
> - user instance data
> I could make a named graph for each of them.
> 
> The other is that I'd like to have metadata about every added/updated
> triple so the app could say "User X updated resource Y with value of Z
> on date W". In this case basically every triple should have its own
> unique URI - i.e. be a named graph with a single statement?
> 
> It seems that I could implement either the first case or the second
> with named graphs, but not both, which I would prefer.
> How would you go about it - has anyone worked on use cases like this?
> Should I still consider reification - and maybe use it together with
> named graphs?

I guess it depends on how you want to manage the data,  whether you need
to limit queries to particular sub-categories of data and just how much
data you are talking about.

In principle you could have a separate named graph both for each
ontology and for each atomic addition of user triples plus a separate
metadata graph. If atomic additions are made one triple at a time that
would be a lot of named graphs but it is possible.

If your updates include retractions than that gets messier in that you
have to remove the old graph as well as add to the new one, still
possible I guess.

FWIW the last time I did serious work with triple level provenance
(which was before named graphs were so much in vogue) I worked it with
just two graphs - one for the asserted data and one for the metadata.
The metadata graph could have used the reification vocabulary but I
found it easier to generate a hash to identify the triple in the data
graph and then use the hash (as a UUID URI) as the subject of provenance
triples in the metadata graph. That's isomorphic to using reification
but is more compact and easier to query.

Dave