You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Frank Budinsky <fr...@ca.ibm.com> on 2012/07/13 18:04:15 UTC
LARQ query with GRAPH clause
Hi,
We've noticed that this (unionDefaultGraph = true) query:
SELECT ?subject ?predicate ?object ?score
WHERE {
(?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
"cruise" .
?subject ?predicate ?object .
}
ORDER BY Desc(?score)
runs significantly faster (i,e., 100x) than this one:
SELECT ?subject ?predicate ?object ?graph ?score
WHERE {
GRAPH ?graph {
(?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
"cruise" .
?subject ?predicate ?object .
}
}
ORDER BY Desc(?score)
Is that expected, and if so, is there another (more efficient) way of
writing such a query that also returns the graphs of the matches?
Thanks,
Frank.
Re: LARQ query with GRAPH clause
Posted by Frank Budinsky <fr...@ca.ibm.com>.
Andy Seaborne <an...@gmail.com> wrote on 07/14/2012 04:59:13
AM:
> From: Andy Seaborne <an...@apache.org>
> To: users@jena.apache.org,
> Date: 07/14/2012 05:00 AM
> Subject: Re: LARQ query with GRAPH clause
> Sent by: Andy Seaborne <an...@gmail.com>
>
> On 13/07/12 21:55, Andy Seaborne wrote:
> > On 13/07/12 17:04, Frank Budinsky wrote:
> >>
> >>
> >> Hi,
> >>
> >> We've noticed that this (unionDefaultGraph = true) query:
> >>
> >> SELECT ?subject ?predicate ?object ?score
> >> WHERE {
> >> (?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
> >> "cruise" .
> >> ?subject ?predicate ?object .
> >> }
> >> ORDER BY Desc(?score)
> >>
> >> runs significantly faster (i,e., 100x) than this one:
> >>
> >> SELECT ?subject ?predicate ?object ?graph ?score
> >> WHERE {
> >> GRAPH ?graph {
> >> (?object ?score)
<http://jena.hpl.hp.com/ARQ/property#textMatch>
> >> "cruise" .
> >> ?subject ?predicate ?object .
> >> }
> >> }
> >> ORDER BY Desc(?score)
> >>
> >> Is that expected, and if so, is there another (more efficient) way of
> >> writing such a query that also returns the graphs of the matches?
> >
> > Which storage layer is this? (TDB?)
> > How many named graph are there? And other details of the data
distribution?
> >
> > Given the other report about property functions and TDB, can I assume
> > you are using 0.9.1 or 0.9.2?
> >
> > It is possible it will be slower when there are many named graphs -
with
> > GRAPH ?g and a property function ARQ may have to iterate over each
named
> > graph in order to know whether the pattern
> >
> > {
> > (?object ?score) pf:textMatch "cruise" .
> > ?subject ?predicate ?object .
> > }
> >
> > matches for that graph.
> >
> > {
> > (?object ?score) pf:textMatch "cruise" .
> > ?subject ?predicate ?object .
> > }
> >
> > on the union graph is in effect ignoring the quad field.
> >
> > ARQ does not have property function support for quads, only triples.
> >
>
> The text index is not tied to the graph.
>
> Corrollary:
>
> WHERE
> {
> (?object ?score) pf:textMatch "cruise" .
> GRAPH ?graph {
> ?subject ?predicate ?object .
> }
> }
>
> should be efficient - it goes to the text index once, then checks all
> the graphs (as a single quad pattern in TDB).
>
> Andy
>
Hi Andy,
You're absolutely right about the cause of the problem - the performance of
the original query is tied to the number of named graphs in the datastore.
Your suggested alternative query fixes the problem.
Thanks a lot for your help.
Frank.
>
>
>
>
>
Re: LARQ query with GRAPH clause
Posted by Andy Seaborne <an...@apache.org>.
On 13/07/12 21:55, Andy Seaborne wrote:
> On 13/07/12 17:04, Frank Budinsky wrote:
>>
>>
>> Hi,
>>
>> We've noticed that this (unionDefaultGraph = true) query:
>>
>> SELECT ?subject ?predicate ?object ?score
>> WHERE {
>> (?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
>> "cruise" .
>> ?subject ?predicate ?object .
>> }
>> ORDER BY Desc(?score)
>>
>> runs significantly faster (i,e., 100x) than this one:
>>
>> SELECT ?subject ?predicate ?object ?graph ?score
>> WHERE {
>> GRAPH ?graph {
>> (?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
>> "cruise" .
>> ?subject ?predicate ?object .
>> }
>> }
>> ORDER BY Desc(?score)
>>
>> Is that expected, and if so, is there another (more efficient) way of
>> writing such a query that also returns the graphs of the matches?
>
> Which storage layer is this? (TDB?)
> How many named graph are there? And other details of the data distribution?
>
> Given the other report about property functions and TDB, can I assume
> you are using 0.9.1 or 0.9.2?
>
> It is possible it will be slower when there are many named graphs - with
> GRAPH ?g and a property function ARQ may have to iterate over each named
> graph in order to know whether the pattern
>
> {
> (?object ?score) pf:textMatch "cruise" .
> ?subject ?predicate ?object .
> }
>
> matches for that graph.
>
> {
> (?object ?score) pf:textMatch "cruise" .
> ?subject ?predicate ?object .
> }
>
> on the union graph is in effect ignoring the quad field.
>
> ARQ does not have property function support for quads, only triples.
>
The text index is not tied to the graph.
Corrollary:
WHERE
{
(?object ?score) pf:textMatch "cruise" .
GRAPH ?graph {
?subject ?predicate ?object .
}
}
should be efficient - it goes to the text index once, then checks all
the graphs (as a single quad pattern in TDB).
Andy
Re: LARQ query with GRAPH clause
Posted by Andy Seaborne <an...@apache.org>.
On 13/07/12 17:04, Frank Budinsky wrote:
>
>
> Hi,
>
> We've noticed that this (unionDefaultGraph = true) query:
>
> SELECT ?subject ?predicate ?object ?score
> WHERE {
> (?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
> "cruise" .
> ?subject ?predicate ?object .
> }
> ORDER BY Desc(?score)
>
> runs significantly faster (i,e., 100x) than this one:
>
> SELECT ?subject ?predicate ?object ?graph ?score
> WHERE {
> GRAPH ?graph {
> (?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
> "cruise" .
> ?subject ?predicate ?object .
> }
> }
> ORDER BY Desc(?score)
>
> Is that expected, and if so, is there another (more efficient) way of
> writing such a query that also returns the graphs of the matches?
Which storage layer is this? (TDB?)
How many named graph are there? And other details of the data distribution?
Given the other report about property functions and TDB, can I assume
you are using 0.9.1 or 0.9.2?
It is possible it will be slower when there are many named graphs - with
GRAPH ?g and a property function ARQ may have to iterate over each named
graph in order to know whether the pattern
{
(?object ?score) pf:textMatch "cruise" .
?subject ?predicate ?object .
}
matches for that graph.
{
(?object ?score) pf:textMatch "cruise" .
?subject ?predicate ?object .
}
on the union graph is in effect ignoring the quad field.
ARQ does not have property function support for quads, only triples.
Could it be done? yes. Basically ignoring the graph filed of the quad
is possibly enough but a slightly different interface to expose the
named graph field would be nicer. Then modify the property function
transformation to cope with quads as well as triples and finally modify
the quad transformation to work on property functions (so the transforms
can be applied in either order).
Contributions of patches welcome (or other ways to make it happen).
Andy