You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Holger Knublauch <ho...@knublauch.com> on 2012/02/16 03:04:34 UTC

How to insert something into the first executing block of a SPARQL query in ARQ

I have a rather advanced question and there may not be a general solution, but I'll try anyway...

Problem: Given an arbitrary SPARQL WHERE clause, I want to insert a line

	?this a ?SOME_TYPE .

so that it always gets executed "first" - I want ?this to be bound to all instances of a given class before the rest of the WHERE clause executes. For example if I have 

WHERE {
    {
        BIND (ex:fun(?this) AS ?result) .
    }
    FILTER (?result != 1)
}

then I want to programmatically rewrite this to

WHERE {
    {
        ?this a ?SOME_TYPE .
        BIND (ex:fun(?this) AS ?result) .
    }
    FILTER (?result != 1)
}

Is there any clean way of implementing this? My naive approach would be to walk the open parentheses until I find the first one with a matching closing parenthesis (or do the equivalent operation on the Syntax tree instead of strings. Given the policy that SPARQL is executed from the inside out, is this a safe assumption? Are there alternative approaches to achieving the goal (of iterating over all instances of a class)? Are there cases where I need to repeat the binding of ?this in multiple sub-blocks? Or will it be more efficient to do the iteration "outside" and simply ask the same query for each instance, with ?this pre-bound with initial bindings?

Thanks a lot
Holger


Re: How to insert something into the first executing block of a SPARQL query in ARQ

Posted by Holger Knublauch <ho...@knublauch.com>.
On Feb 20, 2012, at 7:12 PM, Andy Seaborne wrote:
>> Otherwise it inserts a
>> ?this a ?THIS_TYPE clause into the start of the WHERE clause to
>> optimize performance.
> 
> What happens if ?this is used in several places?

It should work fine - it's bound in the beginning of the block so anything "under" that will iterate over the same instances.


> If you process the algebra, then the overhead is less as you can process the post-optimizer algebra and execute that directly.

Yes, but my understanding of this part of Jena is not good enough yet to go into such a low level. The other problem is simply the number of different cases to consider. For example take

WHERE {
	GRAPH ex:other {
		BIND (ex:getLabel(?this) AS ?label) .
	}
}

Assuming the labels are in a different graph than the rdf:type triples (that inform us about the instances) then this pre-processing becomes very hard. I cannot simply say

WHERE {
	GRAPH ex:other {
		?this a ?TYPE_CLASS .
		BIND (ex:getLabel(?this) AS ?label) .
	}
}

As soon as exceptions from the rule such as the one above show up, I believe I need some outer looping mechanism anyway. I could optimize it from there for special cases.


> Do you need to have this split mode execution?  Why not build a single query that calculates ?this (we do this in the linked data API), not puts it in a repeated substitution?

How would this look like exactly?

Thanks
Holger


Re: How to insert something into the first executing block of a SPARQL query in ARQ

Posted by Andy Seaborne <an...@apache.org>.
On 20/02/12 06:09, Holger Knublauch wrote:
> On Feb 18, 2012, at 9:48 PM, Andy Seaborne wrote:
>> Why not prebind it?
>
> The problem is that ?this iterates over many instances, while
> pre-binding would only allow me to bind a single value per
> execution.

Do you expect the system to any better than loop over the instances?

BINDINGS does, in fact, give that possibility although it's not going to 
happen because, internally, the optimizer flips the data to the left 
side of a join and then it will use index joins for execution.

The only possibility I see (and this issn't ARQ specific) is if the 
expression that is used to calculate the ?this instances is placed in 
the query itself so optimization sees everything at once, else you are, 
in effect, second guessing the optimization process.

> However, this is the solution that I today ended up with. From your
> responses I noticed that there is no simple generic solution to
> pre-binding an iteration of values to a variable everywhere in the
> query,

Submit a patch!

> so that the SPIN API now uses the following logic: If a rule
> or constraint uses ?this in a deeply nested clause (i.e. not on the
> top level of the WHERE clause) then it uses an outer loop to walk
> through the instances and pre-bind ?this. Otherwise it inserts a
> ?this a ?THIS_TYPE clause into the start of the WHERE clause to
> optimize performance.

What happens if ?this is used in several places?

> This algorithm sounds like the safest (less error prone than trying
> to be super smart about the syntax of queries). If I get too much
> pushback from SPIN users that performance has gone down (due to the
> many QueryExecution objects etc)

If you process the algebra, then the overhead is less as you can process 
the post-optimizer algebra and execute that directly.

> then I can always try to get back to
> a smarter algorithm. But not all queries require using ?this in deep
> places, and users can work around this by reformulating their queries
> as well.
>
> Thanks Holger

Do you need to have this split mode execution?  Why not build a single 
query that calculates ?this (we do this in the linked data API), not 
puts it in a repeated substitution?

	Andy


Re: How to insert something into the first executing block of a SPARQL query in ARQ

Posted by Holger Knublauch <ho...@knublauch.com>.
On Feb 18, 2012, at 9:48 PM, Andy Seaborne wrote:
> Why not prebind it?

The problem is that ?this iterates over many instances, while pre-binding would only allow me to bind a single value per execution.

However, this is the solution that I today ended up with. From your responses I noticed that there is no simple generic solution to pre-binding an iteration of values to a variable everywhere in the query, so that the SPIN API now uses the following logic: If a rule or constraint uses ?this in a deeply nested clause (i.e. not on the top level of the WHERE clause) then it uses an outer loop to walk through the instances and pre-bind ?this. Otherwise it inserts a ?this a ?THIS_TYPE clause into the start of the WHERE clause to optimize performance.

This algorithm sounds like the safest (less error prone than trying to be super smart about the syntax of queries). If I get too much pushback from SPIN users that performance has gone down (due to the many QueryExecution objects etc) then I can always try to get back to a smarter algorithm. But not all queries require using ?this in deep places, and users can work around this by reformulating their queries as well.

Thanks
Holger


Re: How to insert something into the first executing block of a SPARQL query in ARQ

Posted by Andy Seaborne <an...@apache.org>.
On 18/02/12 11:48, Andy Seaborne wrote:
> On 18/02/12 08:48, Holger Knublauch wrote:
>>
>> On Feb 18, 2012, at 12:35 AM, Andy Seaborne wrote:
>>> Another suggestion: put a marker pattern into the query and then
>>> you can find it again.
>>>
>>> Put in the triple <tq:holger> <tq:holger> <tq:holger> .
>>
>> Thanks, Andy, for looking into this. Unfortunately those queries are
>> not written by myself, and they might be anything. So I cannot rely
>> on a marker triple. It's basically a feature of the SPIN framework in
>> which rules and constraints can be attached to classes. The variable
>> ?this is then supposed to be (pre) bound to all instances of those
>> classes. My current solution was to insert this clause that iterates
>> over all instances where ?SOME_TYPE is pre-bound to the class using
>> initial bindings. This works fine in most cases, but if some query
>> only uses a function call (and BIND) then the ?this variable may not
>> already be bound and the function call fails.
>
> Why not prebind it?

PS There is also BINDINGS in SPARQL 1.1

WHERE {
     {
         BIND (ex:fun(?this) AS ?result) .
     }
     FILTER (?result != 1)
}
BINDINGS ?this { (<foo>) }


>
> static public void main(String ... args)
> {
> Model m = ModelFactory.createDefaultModel() ;
> Query q = QueryFactory.create("SELECT * { BIND(?x+1 AS ?y) }") ;
> QuerySolutionMap map = new QuerySolutionMap() ;
> map.add("x", m.createTypedLiteral(1)) ;
> QueryExecution qexec = QueryExecutionFactory.create(q, m, map) ;
> ResultSetFormatter.out(qexec.execSelect());
> }
>
>> Is there perhaps a way to control the table unit (if that's always
>> there), so that this could instead be a BGP?
>
> Yes - write a algebra transform to replace (table unit) with what you want.
>
> Subclass TransformCopy and override
>
> public Op transform(OpTable opTable)
>
>> Maybe I could
>> pre-process the ops but I don't understand the algebra mechanism well
>> enough on a reliable pattern on what replacement I would need to do.
>> A syntactic insertion based on { sounds brittle, esp since there are
>> things like GRAPH etc that switch to a completely different query
>> graph.
>>
>> Maybe I need to really do the iteration "outside" for certain rules
>> and constraints if there is no simple place for a ?this binding BGP.
>> This will be slower but at least correct. Do you have any other
>> ideas?
>
> I think I understand you description but the details (of SPIN setup)
> matter here so maybe it would be better to discuss real code. Do you
> have a complete example?
>
>>
>> Thanks a lot Holger
>>
>
> Andy


Re: How to insert something into the first executing block of a SPARQL query in ARQ

Posted by Andy Seaborne <an...@apache.org>.
On 18/02/12 08:48, Holger Knublauch wrote:
>
> On Feb 18, 2012, at 12:35 AM, Andy Seaborne wrote:
>> Another suggestion: put a marker pattern into the query and then
>> you can find it again.
>>
>> Put in the triple <tq:holger>  <tq:holger>  <tq:holger>  .
>
> Thanks, Andy, for looking into this. Unfortunately those queries are
> not written by myself, and they might be anything. So I cannot rely
> on a marker triple. It's basically a feature of the SPIN framework in
> which rules and constraints can be attached to classes. The variable
> ?this is then supposed to be (pre) bound to all instances of those
> classes. My current solution was to insert this clause that iterates
> over all instances where ?SOME_TYPE is pre-bound to the class using
> initial bindings. This works fine in most cases, but if some query
> only uses a function call (and BIND) then the ?this variable may not
> already be bound and the function call fails.

Why not prebind it?

static public void main(String ... args)
     {
         Model m = ModelFactory.createDefaultModel() ;
         Query q = QueryFactory.create("SELECT * { BIND(?x+1 AS ?y) }") ;
         QuerySolutionMap map = new QuerySolutionMap() ;
         map.add("x", m.createTypedLiteral(1)) ;
         QueryExecution qexec = QueryExecutionFactory.create(q, m, map) ;
         ResultSetFormatter.out(qexec.execSelect());
     }

> Is there perhaps a way to control the table unit (if that's always
> there), so that this could instead be a BGP?

Yes - write a algebra transform to replace (table unit) with what you want.

Subclass TransformCopy and override

public Op transform(OpTable opTable)

> Maybe I could
> pre-process the ops but I don't understand the algebra mechanism well
> enough on a reliable pattern on what replacement I would need to do.
> A syntactic insertion based on { sounds brittle, esp since there are
> things like GRAPH etc that switch to a completely different query
> graph.
>
> Maybe I need to really do the iteration "outside" for certain rules
> and constraints if there is no simple place for a ?this binding BGP.
> This will be slower but at least correct. Do you have any other
> ideas?

I think I understand you description but the details (of SPIN setup) 
matter here so maybe it would be better to discuss real code.  Do you 
have a complete example?

>
> Thanks a lot Holger
>

	Andy

Re: How to insert something into the first executing block of a SPARQL query in ARQ

Posted by Holger Knublauch <ho...@knublauch.com>.
On Feb 18, 2012, at 12:35 AM, Andy Seaborne wrote:
> Another suggestion: put a marker pattern into the query and then you can find it again.
> 
> Put in the triple
>    <tq:holger> <tq:holger> <tq:holger> .

Thanks, Andy, for looking into this. Unfortunately those queries are not written by myself, and they might be anything. So I cannot rely on a marker triple. It's basically a feature of the SPIN framework in which rules and constraints can be attached to classes. The variable ?this is then supposed to be (pre) bound to all instances of those classes. My current solution was to insert this clause that iterates over all instances where ?SOME_TYPE is pre-bound to the class using initial bindings. This works fine in most cases, but if some query only uses a function call (and BIND) then the ?this variable may not already be bound and the function call fails.

Is there perhaps a way to control the table unit (if that's always there), so that this could instead be a BGP? Maybe I could pre-process the ops but I don't understand the algebra mechanism well enough on a reliable pattern on what replacement I would need to do. A syntactic insertion based on { sounds brittle, esp since there are things like GRAPH etc that switch to a completely different query graph.

Maybe I need to really do the iteration "outside" for certain rules and constraints if there is no simple place for a ?this binding BGP. This will be slower but at least correct. Do you have any other ideas?

Thanks a lot
Holger


Re: How to insert something into the first executing block of a SPARQL query in ARQ

Posted by Andy Seaborne <an...@apache.org>.
On 16/02/12 02:04, Holger Knublauch wrote:
> I have a rather advanced question and there may not be a general
> solution, but I'll try anyway...
>
> Problem: Given an arbitrary SPARQL WHERE clause, I want to insert a
> line
>
>     ?this a ?SOME_TYPE .
>
> so that it always gets executed "first" - I want ?this to be bound to
> all instances of a given class before the rest of the WHERE clause
> executes. For example if I have

The optimizer will only rearrange things when it does not change the 
answers.

Putting it first, just after the { should be OK but the algebra is more 
robust.

BIND operates (like FILTER) after the patterns matched in that 
particular block so anywhere in the block will work.

>
> WHERE {
>      {
>          BIND (ex:fun(?this) AS ?result) .
>      }
>      FILTER (?result != 1)
> }

which is:

      (filter (!= ?result 1)
         (extend ((?result (ex:fun ?this)))
           (table unit)))

e.g. filter of
      bind of
      pattern match

(table unit) being the empty basic graph pattern

>
> then I want to programmatically rewrite this to
>
> WHERE {
>      {
>          ?this a ?SOME_TYPE .
>          BIND (ex:fun(?this) AS ?result) .
>      }
>      FILTER (?result != 1)
> }

That is, you want:

       (filter (!= ?result 1)
         (extend ((?result (ex:fun ?this)))
           (bgp (triple ?this rdf:type ?SOME_TYPE))))

> Is there any clean way of implementing this? My naive approach would
> be to walk the open parentheses until I find the first one with a
> matching closing parenthesis (or do the equivalent operation on the
> Syntax tree instead of strings. Given the policy that SPARQL is
> executed from the inside out, is this a safe assumption? Are there
> alternative approaches to achieving the goal (of iterating over all
> instances of a class)? Are there cases where I need to repeat the
> binding of ?this in multiple sub-blocks? Or will it be more efficient
> to do the iteration "outside" and simply ask the same query for each
> instance, with ?this pre-bound with initial bindings?

Another suggestion: put a marker pattern into the query and then you can 
find it again.

Put in the triple
     <tq:holger> <tq:holger> <tq:holger> .

and then you can go and find it again so you know where it is after 
parsing and algebra generation.

       (filter (!= ?result 1)
         (extend ((?result (ex:fun ?this)))
           (bgp (triple <tq:holger> <tq:holger> <tq:holger>))))

	Andy