You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@jena.apache.org by Frank Budinsky <fr...@ca.ibm.com> on 2012/07/13 18:04:15 UTC

LARQ query with GRAPH clause


Hi,

We've noticed that this (unionDefaultGraph = true) query:

SELECT ?subject ?predicate ?object ?score
  WHERE {
    (?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
"cruise" .
    ?subject ?predicate ?object .
  }
ORDER BY Desc(?score)

runs significantly faster (i,e., 100x) than this one:

SELECT ?subject ?predicate ?object ?graph ?score
  WHERE {
    GRAPH ?graph {
      (?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
"cruise" .
      ?subject ?predicate ?object .
    }
  }
ORDER BY Desc(?score)

Is that expected, and if so, is there another (more efficient) way of
writing such a query that also returns the graphs of the matches?

Thanks,
Frank.

Re: LARQ query with GRAPH clause

Posted by Frank Budinsky <fr...@ca.ibm.com>.


Andy Seaborne <an...@gmail.com> wrote on 07/14/2012 04:59:13
AM:

> From: Andy Seaborne <an...@apache.org>
> To: users@jena.apache.org,
> Date: 07/14/2012 05:00 AM
> Subject: Re: LARQ query with GRAPH clause
> Sent by: Andy Seaborne <an...@gmail.com>
>
> On 13/07/12 21:55, Andy Seaborne wrote:
> > On 13/07/12 17:04, Frank Budinsky wrote:
> >>
> >>
> >> Hi,
> >>
> >> We've noticed that this (unionDefaultGraph = true) query:
> >>
> >> SELECT ?subject ?predicate ?object ?score
> >>    WHERE {
> >>      (?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
> >> "cruise" .
> >>      ?subject ?predicate ?object .
> >>    }
> >> ORDER BY Desc(?score)
> >>
> >> runs significantly faster (i,e., 100x) than this one:
> >>
> >> SELECT ?subject ?predicate ?object ?graph ?score
> >>    WHERE {
> >>      GRAPH ?graph {
> >>        (?object ?score)
<http://jena.hpl.hp.com/ARQ/property#textMatch>
> >> "cruise" .
> >>        ?subject ?predicate ?object .
> >>      }
> >>    }
> >> ORDER BY Desc(?score)
> >>
> >> Is that expected, and if so, is there another (more efficient) way of
> >> writing such a query that also returns the graphs of the matches?
> >
> > Which storage layer is this? (TDB?)
> > How many named graph are there? And other details of the data
distribution?
> >
> > Given the other report about property functions and TDB, can I assume
> > you are using 0.9.1 or 0.9.2?
> >
> > It is possible it will be slower when there are many named graphs -
with
> > GRAPH ?g and a property function ARQ may have to iterate over each
named
> > graph in order to know whether the pattern
> >
> > {
> >    (?object ?score) pf:textMatch "cruise" .
> >    ?subject ?predicate ?object .
> > }
> >
> > matches for that graph.
> >
> > {
> >   (?object ?score) pf:textMatch "cruise" .
> >    ?subject ?predicate ?object .
> > }
> >
> > on the union graph is in effect ignoring the quad field.
> >
> > ARQ does not have property function support for quads, only triples.
> >
>
> The text index is not tied to the graph.
>
> Corrollary:
>
> WHERE
> {
>     (?object ?score) pf:textMatch "cruise" .
>      GRAPH ?graph {
>         ?subject ?predicate ?object .
>        }
> }
>
> should be efficient - it goes to the text index once, then checks all
> the graphs (as a single quad pattern in TDB).
>
>    Andy
>

Hi Andy,

You're absolutely right about the cause of the problem - the performance of
the original query is tied to the number of named graphs in the datastore.

Your suggested alternative query fixes the problem.

Thanks a lot for your help.

Frank.

>
>
>
>
>

Re: LARQ query with GRAPH clause

Posted by Andy Seaborne <an...@apache.org>.

On 13/07/12 21:55, Andy Seaborne wrote:
> On 13/07/12 17:04, Frank Budinsky wrote:
>>
>>
>> Hi,
>>
>> We've noticed that this (unionDefaultGraph = true) query:
>>
>> SELECT ?subject ?predicate ?object ?score
>>    WHERE {
>>      (?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
>> "cruise" .
>>      ?subject ?predicate ?object .
>>    }
>> ORDER BY Desc(?score)
>>
>> runs significantly faster (i,e., 100x) than this one:
>>
>> SELECT ?subject ?predicate ?object ?graph ?score
>>    WHERE {
>>      GRAPH ?graph {
>>        (?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
>> "cruise" .
>>        ?subject ?predicate ?object .
>>      }
>>    }
>> ORDER BY Desc(?score)
>>
>> Is that expected, and if so, is there another (more efficient) way of
>> writing such a query that also returns the graphs of the matches?
>
> Which storage layer is this? (TDB?)
> How many named graph are there? And other details of the data distribution?
>
> Given the other report about property functions and TDB, can I assume
> you are using 0.9.1 or 0.9.2?
>
> It is possible it will be slower when there are many named graphs - with
> GRAPH ?g and a property function ARQ may have to iterate over each named
> graph in order to know whether the pattern
>
> {
>    (?object ?score) pf:textMatch "cruise" .
>    ?subject ?predicate ?object .
> }
>
> matches for that graph.
>
> {
>   (?object ?score) pf:textMatch "cruise" .
>    ?subject ?predicate ?object .
> }
>
> on the union graph is in effect ignoring the quad field.
>
> ARQ does not have property function support for quads, only triples.
>

The text index is not tied to the graph.

Corrollary:

WHERE
{
    (?object ?score) pf:textMatch "cruise" .
     GRAPH ?graph {
        ?subject ?predicate ?object .
       }
}

should be efficient - it goes to the text index once, then checks all 
the graphs (as a single quad pattern in TDB).

	Andy

Re: LARQ query with GRAPH clause

Posted by Andy Seaborne <an...@apache.org>.

On 13/07/12 17:04, Frank Budinsky wrote:
>
>
> Hi,
>
> We've noticed that this (unionDefaultGraph = true) query:
>
> SELECT ?subject ?predicate ?object ?score
>    WHERE {
>      (?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
> "cruise" .
>      ?subject ?predicate ?object .
>    }
> ORDER BY Desc(?score)
>
> runs significantly faster (i,e., 100x) than this one:
>
> SELECT ?subject ?predicate ?object ?graph ?score
>    WHERE {
>      GRAPH ?graph {
>        (?object ?score) <http://jena.hpl.hp.com/ARQ/property#textMatch>
> "cruise" .
>        ?subject ?predicate ?object .
>      }
>    }
> ORDER BY Desc(?score)
>
> Is that expected, and if so, is there another (more efficient) way of
> writing such a query that also returns the graphs of the matches?

Which storage layer is this? (TDB?)
How many named graph are there? And other details of the data distribution?

Given the other report about property functions and TDB, can I assume 
you are using 0.9.1 or 0.9.2?

It is possible it will be slower when there are many named graphs - with 
GRAPH ?g and a property function ARQ may have to iterate over each named 
graph in order to know whether the pattern

{
   (?object ?score) pf:textMatch "cruise" .
   ?subject ?predicate ?object .
}

matches for that graph.

{
  (?object ?score) pf:textMatch "cruise" .
   ?subject ?predicate ?object .
}

on the union graph is in effect ignoring the quad field.

ARQ does not have property function support for quads, only triples.

Could it be done?  yes.  Basically ignoring the graph filed of the quad 
is possibly enough but a slightly different interface to expose the 
named graph field would be nicer. Then modify the property function 
transformation to cope with quads as well as triples and finally modify 
the quad transformation to work on property functions (so the transforms 
can be applied in either order).

Contributions of patches welcome (or other ways to make it happen).

	Andy