You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Paolo Castagna <ca...@googlemail.com> on 2012/03/10 13:57:38 UTC

On SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o } } ...

Hi,
this is one of those queries people want to do: SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o } } (see recent thread on jena-users).
It isn't unreasonable: people want to know what are the graphs in an RDF dataset, but as for SELECT DISTINCT ?p WHERE {?s ?p ?o.} or SELECT DISTINCT ?cls WEHRE {?i a ?cls.}, I do not see what we could
possibly do to speed things up. It's a scan of the entire index and therefore expensive, even more so if the index do not fit into RAM.

However, if there was an easy way to see all the triples/quads added/removed, those would be very easy to compute and fast to return... however, one need to spot the query and have the optimizer use a
different index for those queries. I also imagine that once you start going doing this path, more and more 'useful' queries will appear => more and more indexes => more and more slowdown at write time.

Paolo

Re: On SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o } } ...

Posted by Paolo Castagna <ca...@googlemail.com>.
Hi Andy

Andy Seaborne wrote:
> On 10/03/12 12:57, Paolo Castagna wrote:
>> Hi, this is one of those queries people want to do: SELECT DISTINCT
>> ?g WHERE { GRAPH ?g { ?s ?p ?o } } (see recent thread on
>> jena-users). It isn't unreasonable: people want to know what are the
>> graphs in an RDF dataset, but as for SELECT DISTINCT ?p WHERE {?s ?p
>> ?o.} or SELECT DISTINCT ?cls WEHRE {?i a ?cls.}, I do not see what we
>> could possibly do to speed things up. It's a scan of the entire index
>> and therefore expensive, even more so if the index do not fit into
>> RAM.
>>
>> However, if there was an easy way to see all the triples/quads
>> added/removed, those would be very easy to compute and fast to
>> return... however, one need to spot the query and have the optimizer
>> use a different index for those queries. I also imagine that once you
>> start going doing this path, more and more 'useful' queries will
>> appear =>  more and more indexes =>  more and more slowdown at write
>> time.
>>
>> Paolo
> 
> Isn't that a good usecase for query caching?

Indeed.

Even if, depending on the rate of updates and in this case how many graphs are added/removed, invalidation and refreshing the cache might become a challenge.
For read-only or mostly read scenarios, definitely yes.

> There is a need for a graph management facility, and an effect of that
> could be to have GRAPH ?g {} (or anything optimzied to the same) go look
> in the graph info datastructures.

I will.

> But query caching has so many other advantages, seems a shame not to us it.

Yep.

Do you mean "a shame not to use it"?
What's 'it'? :-)

Paolo

> 
>     Andy

Re: On SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o } } ...

Posted by Andy Seaborne <an...@apache.org>.
On 10/03/12 12:57, Paolo Castagna wrote:
> Hi, this is one of those queries people want to do: SELECT DISTINCT
> ?g WHERE { GRAPH ?g { ?s ?p ?o } } (see recent thread on
> jena-users). It isn't unreasonable: people want to know what are the
> graphs in an RDF dataset, but as for SELECT DISTINCT ?p WHERE {?s ?p
> ?o.} or SELECT DISTINCT ?cls WEHRE {?i a ?cls.}, I do not see what we
> could possibly do to speed things up. It's a scan of the entire index
> and therefore expensive, even more so if the index do not fit into
> RAM.
>
> However, if there was an easy way to see all the triples/quads
> added/removed, those would be very easy to compute and fast to
> return... however, one need to spot the query and have the optimizer
> use a different index for those queries. I also imagine that once you
> start going doing this path, more and more 'useful' queries will
> appear =>  more and more indexes =>  more and more slowdown at write
> time.
>
> Paolo

Isn't that a good usecase for query caching?

There is a need for a graph management facility, and an effect of that 
could be to have GRAPH ?g {} (or anything optimzied to the same) go look 
in the graph info datastructures.

But query caching has so many other advantages, seems a shame not to us it.

	Andy