You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Simon Helsen <sh...@ca.ibm.com> on 2012/07/17 00:27:34 UTC
optimization opportunity lost in 2.7.x which existed in 2.6.x?
Hi everyone,
some of our clients were reporting a rather severe performance breakdown
on top of 2.7.x. After further investigation, it turns out that they had
queries whose optimized plan was suddenly very weak for the given
repository layout and shape. Strangely enough, putting some curlies around
the right triple patterns was sufficient to push the optimizer to do the
correct optimization. Below is the original query and its plan and the
adjusted query and its plan.
I have 2 questions:
1) it seems this behavior changed against 2.6.x. Is this a known issue,
e.g. a change which was required to avert a bug?
2) it is not clear to me why the optimizer needs the curlies in order to
do the right thing. I.e. why it cannot achieve the same in the original
query
thanks
Simon
Original query:
PREFIX t1:<t1>
PREFIX t2:<t2>
PREFIX t3:<t3>
PREFIX t4:<t4>
PREFIX t5:<t5>
SELECT DISTINCT ?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5
?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10
WHERE
{ ?R1 t1:p0 <https://host/jts/process/project-areas/p1>
FILTER ( ?R1 = <https://host/rm/resources/r1> )
{ ?R1 t1:p1 ?R1_v6 }
OPTIONAL
{ ?R1 t2:p2 ?R1_v9 }
OPTIONAL
{ ?R1 t2:p3 ?R1_v7 }
OPTIONAL
{ ?R1 t2:p4 ?R1_v10 }
OPTIONAL
{ ?R1 t2:p5 ?R1_v8 }
OPTIONAL
{ ?R1 t3:p6 ?R1_v1 .
?R1_v1 t2:p5 ?R1_uv2
}
OPTIONAL
{ ?R1 t3:p7 ?R1_v2 }
OPTIONAL
{ ?R1 t3:p8 ?R1_v5 }
OPTIONAL
{ ?R1 t3:p9 ?R1_v4 }
OPTIONAL
{ ?R1 t4:p10 ?R1_v3 .
?R1_v3 t5:p11 ?R1_uv1
}
?R1 t5:p12 t3:Artifact .
?R1 t1:p0 ?R1_resourceContext
}
Original plan:
(prefix ((t4: <file:///C:/Temp/t4>)
(t5: <file:///C:/Temp/t5>)
(t1: <file:///C:/Temp/t1>)
(t2: <file:///C:/Temp/t2>)
(t3: <file:///C:/Temp/t3>))
(distinct
(project (?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5
?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10)
(filter (= ?R1 <https://host/rm/resources/r1>)
(sequence
(conditional
(conditional
(conditional
(conditional
(conditional
(conditional
(conditional
(conditional
(conditional
(bgp
(triple ?R1 t1:p0 <
https://host/jts/process/project-areas/p1>)
(triple ?R1 t1:p1 ?R1_v6)
)
(bgp (triple ?R1 t2:p2 ?R1_v9)))
(bgp (triple ?R1 t2:p3 ?R1_v7)))
(bgp (triple ?R1 t2:p4 ?R1_v10)))
(bgp (triple ?R1 t2:p5 ?R1_v8)))
(bgp
(triple ?R1 t3:p6 ?R1_v1)
(triple ?R1_v1 t2:p5 ?R1_uv2)
))
(bgp (triple ?R1 t3:p7 ?R1_v2)))
(bgp (triple ?R1 t3:p8 ?R1_v5)))
(bgp (triple ?R1 t3:p9 ?R1_v4)))
(bgp
(triple ?R1 t4:p10 ?R1_v3)
(triple ?R1_v3 t5:p11 ?R1_uv1)
))
(bgp
(triple ?R1 t5:p12 t3:Artifact)
(triple ?R1 t1:p0 ?R1_resourceContext)
))))))
Adjusted query (the first 2 constraints and its filter have been
surrounded by curlies):
PREFIX t1:<t1>
PREFIX t2:<t2>
PREFIX t3:<t3>
PREFIX t4:<t4>
PREFIX t5:<t5>
SELECT DISTINCT ?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5
?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10
WHERE
{
{
?R1 t1:p0 <https://host/jts/process/project-areas/p1>
FILTER ( ?R1 = <https://host/rm/resources/r1> )
{ ?R1 t1:p1 ?R1_v6 }
}
OPTIONAL
{ ?R1 t2:p2 ?R1_v9 }
OPTIONAL
{ ?R1 t2:p3 ?R1_v7 }
OPTIONAL
{ ?R1 t2:p4 ?R1_v10 }
OPTIONAL
{ ?R1 t2:p5 ?R1_v8 }
OPTIONAL
{ ?R1 t3:p6 ?R1_v1 .
?R1_v1 t2:p5 ?R1_uv2
}
OPTIONAL
{ ?R1 t3:p7 ?R1_v2 }
OPTIONAL
{ ?R1 t3:p8 ?R1_v5 }
OPTIONAL
{ ?R1 t3:p9 ?R1_v4 }
OPTIONAL
{ ?R1 t4:p10 ?R1_v3 .
?R1_v3 t5:p11 ?R1_uv1
}
?R1 t5:p12 t3:Artifact .
?R1 t1:p0 ?R1_resourceContext
}
Adjusted plan:
(prefix ((t4: <file:///C:/Temp/t4>)
(t5: <file:///C:/Temp/t5>)
(t1: <file:///C:/Temp/t1>)
(t2: <file:///C:/Temp/t2>)
(t3: <file:///C:/Temp/t3>))
(distinct
(project (?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5
?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10)
(sequence
(conditional
(conditional
(conditional
(conditional
(conditional
(conditional
(conditional
(conditional
(conditional
(assign ((?R1 <https://host/rm/resources/r1>))
(bgp
(triple <https://host/rm/resources/r1> t1:p0
<https://host/jts/process/project-areas/p1>)
(triple <https://host/rm/resources/r1> t1:p1
?R1_v6)
))
(bgp (triple ?R1 t2:p2 ?R1_v9)))
(bgp (triple ?R1 t2:p3 ?R1_v7)))
(bgp (triple ?R1 t2:p4 ?R1_v10)))
(bgp (triple ?R1 t2:p5 ?R1_v8)))
(bgp
(triple ?R1 t3:p6 ?R1_v1)
(triple ?R1_v1 t2:p5 ?R1_uv2)
))
(bgp (triple ?R1 t3:p7 ?R1_v2)))
(bgp (triple ?R1 t3:p8 ?R1_v5)))
(bgp (triple ?R1 t3:p9 ?R1_v4)))
(bgp
(triple ?R1 t4:p10 ?R1_v3)
(triple ?R1_v3 t5:p11 ?R1_uv1)
))
(bgp
(triple ?R1 t5:p12 t3:Artifact)
(triple ?R1 t1:p0 ?R1_resourceContext)
)))))
Note the assign ((?R1 <https://host/rm/resources/r1>), which makes the
query scalable on a large repository
Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?
Posted by Simon Helsen <sh...@ca.ibm.com>.
Andy,
ok, so the group insertion is a different query, but even if that is the
case, why can the FILTER assignment not take place in all groups?
Note that in 2.6.x, something along these lines did happen (I don't have
the plan around, but we because of our performance numbers)
Simon
From:
Andy Seaborne <an...@apache.org>
To:
Simon Helsen/Toronto/IBM@IBMCA
Cc:
dev@jena.apache.org
Date:
07/18/2012 05:42 PM
Subject:
Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?
On 18/07/12 16:25, Simon Helsen wrote:
> Andy,
>
> I have simplified the scenario a little bit. So the following query
>
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
>
> leads to
>
> (prefix ((p1: <file:///C:/Temp/t1>)
> (p2: <file:///C:/Temp/t2>))
> (distinct
> (project (?R1 ?optionalValue)
> (filter (= ?R1 <https://host:9443/rm/resources/_r1>)
> (conditional
> (bgp (triple ?R1 p1:pr1
> <https://host:9443/jts/process/project-areas/p>))
> (bgp (triple ?R1 p2:pr2 ?optionalValue)))))))
>
> whereas the following query
>
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> }
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
>
> leads to
>
> (prefix ((p1: <file:///C:/Temp/t1>)
> (p2: <file:///C:/Temp/t2>))
> (distinct
> (project (?R1 ?optionalValue)
> (conditional
> (assign ((?R1 <https://host:9443/rm/resources/_r1>))
> (bgp (triple <https:/host:9443/rm/resources/_r1> p1:pr1
> <https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
> (bgp (triple ?R1 p2:pr2 ?optionalValue))))))
>
> So, looking at this in more detail, the former plan surprises me a bit
> because it proposes to evaluate p1:pr1 as optional, even though I didn't
> express that in the query.
Yes, you did.
"conditional" is a binary operator, an optimized for of leftjoin. The
first arg is the fixed side and the second args the conditional part.
{ ?s ?p ?o OPTIONAL { ?s1 ?p1 ?o1 } }
becomes
(leftjoin
(bgp (triple ?s ?p ?o)
(bgp (triple ?s1 ?p1 ?o1))
which is executed as
(conditional
(bgp (triple ?s ?p ?o)
(bgp (triple ?s1 ?p1 ?o1))
(sequence) is rather different.
> WHERE
> {
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> }
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
is a different query because FILTERS are group-wide, a group being what
is between {} (caveat a special case in OPTIONAL not occuring here).
Adding the extra {} means the FILTER is applied to
?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
alone.
Andy
Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?
Posted by Andy Seaborne <an...@apache.org>.
On 19/07/12 01:52, Simon Helsen wrote:
> Sorry, I am looking at this again. Are you saying that in
>
> (prefix ((p1: <file:///C:/Temp/t1>)
> (p2: <file:///C:/Temp/t2>))
> (distinct
> (project (?R1 ?optionalValue)
> (conditional
> (assign ((?R1 <https://perfrrs.ibm.com:9443/rm/resources/_r1>))
> (bgp (triple <https://perfrrs.ibm.com:9443/rm/resources/_r1>
> p1:pr1 <https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
> (bgp (triple ?R1 p2:pr2 ?optionalValue))))))
>
> ?R1 is not always bound to
> <https://perfrrs.ibm.com:9443/rm/resources/_r1>?
It is bound - (conditional) flows the bindings from the left into the
right. It's a left-index-join with no scope restrictions.
(conditional) and (sequence) are used when there are no scope issues
arise (e.g. doubly nested optionals with no use of the var in the
middele optiona, use of FILTER variables in out-of-scope places)
> And therefore, are you
> saying that ?R1 can be any value which satisfies p2:pr2 ?optionalValue ?
>
> I still don't understand why the assign optimization cannot be pushed
> into the optional in the query
I don't know why the optimization is taking place. I would have
expected it to occur in the shorter example at least and the longer one
looks like the same structure (when you indent the thing - the
sparql.org query validator pretty prints queries - its a web service
around the engine of arq.qparse). But I haven't looked at the code
in-depth yet to see if some corner case is blocking it or whether it is
something to do with a change in the order optimization strategies are
applied.
Andy
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> ?R1 p1:pr1 <https://perfjts.ibm.com:9443/jts/process/project-areas/p>
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> FILTER ( ?R1 = <https://perfrrs.ibm.com:9443/rm/resources/_r1>)
> }
>
> is there perhaps a different way to express this?
>
> Simon
>
>
> From: Andy Seaborne <an...@apache.org>
> To: Simon Helsen/Toronto/IBM@IBMCA
> Cc: dev@jena.apache.org
> Date: 07/18/2012 05:42 PM
> Subject: Re: optimization opportunity lost in 2.7.x which existed in
> 2.6.x?
>
>
> ------------------------------------------------------------------------
>
>
>
> On 18/07/12 16:25, Simon Helsen wrote:
> > Andy,
> >
> > I have simplified the scenario a little bit. So the following query
> >
> > PREFIX p1:<t1>
> > PREFIX p2:<t2>
> > SELECT DISTINCT ?R1 ?optionalValue
> > WHERE
> > {
> > ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> > FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> > OPTIONAL
> > { ?R1 p2:pr2 ?optionalValue }
> > }
> >
> > leads to
> >
> > (prefix ((p1: <file:///C:/Temp/t1>)
> > (p2: <file:///C:/Temp/t2>))
> > (distinct
> > (project (?R1 ?optionalValue)
> > (filter (= ?R1 <https://host:9443/rm/resources/_r1>)
> > (conditional
> > (bgp (triple ?R1 p1:pr1
> > <https://host:9443/jts/process/project-areas/p>))
> > (bgp (triple ?R1 p2:pr2 ?optionalValue)))))))
> >
> > whereas the following query
> >
> > PREFIX p1:<t1>
> > PREFIX p2:<t2>
> > SELECT DISTINCT ?R1 ?optionalValue
> > WHERE
> > {
> > {
> > ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> > FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> > }
> > OPTIONAL
> > { ?R1 p2:pr2 ?optionalValue }
> > }
> >
> > leads to
> >
> > (prefix ((p1: <file:///C:/Temp/t1>)
> > (p2: <file:///C:/Temp/t2>))
> > (distinct
> > (project (?R1 ?optionalValue)
> > (conditional
> > (assign ((?R1 <https://host:9443/rm/resources/_r1>))
> > (bgp (triple <https:/host:9443/rm/resources/_r1> p1:pr1
> > <https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
> > (bgp (triple ?R1 p2:pr2 ?optionalValue))))))
> >
> > So, looking at this in more detail, the former plan surprises me a bit
> > because it proposes to evaluate p1:pr1 as optional, even though I didn't
> > express that in the query.
>
> Yes, you did.
>
> "conditional" is a binary operator, an optimized for of leftjoin. The
> first arg is the fixed side and the second args the conditional part.
>
>
> { ?s ?p ?o OPTIONAL { ?s1 ?p1 ?o1 } }
>
> becomes
>
> (leftjoin
> (bgp (triple ?s ?p ?o)
> (bgp (triple ?s1 ?p1 ?o1))
>
> which is executed as
>
> (conditional
> (bgp (triple ?s ?p ?o)
> (bgp (triple ?s1 ?p1 ?o1))
>
> (sequence) is rather different.
>
> > WHERE
> > {
> > {
> > ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> > FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> > }
> > OPTIONAL
> > { ?R1 p2:pr2 ?optionalValue }
> > }
>
> is a different query because FILTERS are group-wide, a group being what
> is between {} (caveat a special case in OPTIONAL not occuring here).
>
> Adding the extra {} means the FILTER is applied to
>
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
>
> alone.
>
> Andy
>
>
>
Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?
Posted by Simon Helsen <sh...@ca.ibm.com>.
Sorry, I am looking at this again. Are you saying that in
(prefix ((p1: <file:///C:/Temp/t1>)
(p2: <file:///C:/Temp/t2>))
(distinct
(project (?R1 ?optionalValue)
(conditional
(assign ((?R1 <https://perfrrs.ibm.com:9443/rm/resources/_r1>))
(bgp (triple <https://perfrrs.ibm.com:9443/rm/resources/_r1>
p1:pr1 <https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
(bgp (triple ?R1 p2:pr2 ?optionalValue))))))
?R1 is not always bound to <https://perfrrs.ibm.com:9443/rm/resources/_r1
>? And therefore, are you saying that ?R1 can be any value which satisfies
p2:pr2 ?optionalValue ?
I still don't understand why the assign optimization cannot be pushed into
the optional in the query
PREFIX p1:<t1>
PREFIX p2:<t2>
SELECT DISTINCT ?R1 ?optionalValue
WHERE
{
?R1 p1:pr1 <https://perfjts.ibm.com:9443/jts/process/project-areas/p>
OPTIONAL
{ ?R1 p2:pr2 ?optionalValue }
FILTER ( ?R1 = <https://perfrrs.ibm.com:9443/rm/resources/_r1>)
}
is there perhaps a different way to express this?
Simon
From:
Andy Seaborne <an...@apache.org>
To:
Simon Helsen/Toronto/IBM@IBMCA
Cc:
dev@jena.apache.org
Date:
07/18/2012 05:42 PM
Subject:
Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?
On 18/07/12 16:25, Simon Helsen wrote:
> Andy,
>
> I have simplified the scenario a little bit. So the following query
>
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
>
> leads to
>
> (prefix ((p1: <file:///C:/Temp/t1>)
> (p2: <file:///C:/Temp/t2>))
> (distinct
> (project (?R1 ?optionalValue)
> (filter (= ?R1 <https://host:9443/rm/resources/_r1>)
> (conditional
> (bgp (triple ?R1 p1:pr1
> <https://host:9443/jts/process/project-areas/p>))
> (bgp (triple ?R1 p2:pr2 ?optionalValue)))))))
>
> whereas the following query
>
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> }
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
>
> leads to
>
> (prefix ((p1: <file:///C:/Temp/t1>)
> (p2: <file:///C:/Temp/t2>))
> (distinct
> (project (?R1 ?optionalValue)
> (conditional
> (assign ((?R1 <https://host:9443/rm/resources/_r1>))
> (bgp (triple <https:/host:9443/rm/resources/_r1> p1:pr1
> <https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
> (bgp (triple ?R1 p2:pr2 ?optionalValue))))))
>
> So, looking at this in more detail, the former plan surprises me a bit
> because it proposes to evaluate p1:pr1 as optional, even though I didn't
> express that in the query.
Yes, you did.
"conditional" is a binary operator, an optimized for of leftjoin. The
first arg is the fixed side and the second args the conditional part.
{ ?s ?p ?o OPTIONAL { ?s1 ?p1 ?o1 } }
becomes
(leftjoin
(bgp (triple ?s ?p ?o)
(bgp (triple ?s1 ?p1 ?o1))
which is executed as
(conditional
(bgp (triple ?s ?p ?o)
(bgp (triple ?s1 ?p1 ?o1))
(sequence) is rather different.
> WHERE
> {
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> }
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
is a different query because FILTERS are group-wide, a group being what
is between {} (caveat a special case in OPTIONAL not occuring here).
Adding the extra {} means the FILTER is applied to
?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
alone.
Andy
Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?
Posted by Andy Seaborne <an...@apache.org>.
On 18/07/12 16:25, Simon Helsen wrote:
> Andy,
>
> I have simplified the scenario a little bit. So the following query
>
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
>
> leads to
>
> (prefix ((p1: <file:///C:/Temp/t1>)
> (p2: <file:///C:/Temp/t2>))
> (distinct
> (project (?R1 ?optionalValue)
> (filter (= ?R1 <https://host:9443/rm/resources/_r1>)
> (conditional
> (bgp (triple ?R1 p1:pr1
> <https://host:9443/jts/process/project-areas/p>))
> (bgp (triple ?R1 p2:pr2 ?optionalValue)))))))
>
> whereas the following query
>
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> }
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
>
> leads to
>
> (prefix ((p1: <file:///C:/Temp/t1>)
> (p2: <file:///C:/Temp/t2>))
> (distinct
> (project (?R1 ?optionalValue)
> (conditional
> (assign ((?R1 <https://host:9443/rm/resources/_r1>))
> (bgp (triple <https:/host:9443/rm/resources/_r1> p1:pr1
> <https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
> (bgp (triple ?R1 p2:pr2 ?optionalValue))))))
>
> So, looking at this in more detail, the former plan surprises me a bit
> because it proposes to evaluate p1:pr1 as optional, even though I didn't
> express that in the query.
Yes, you did.
"conditional" is a binary operator, an optimized for of leftjoin. The
first arg is the fixed side and the second args the conditional part.
{ ?s ?p ?o OPTIONAL { ?s1 ?p1 ?o1 } }
becomes
(leftjoin
(bgp (triple ?s ?p ?o)
(bgp (triple ?s1 ?p1 ?o1))
which is executed as
(conditional
(bgp (triple ?s ?p ?o)
(bgp (triple ?s1 ?p1 ?o1))
(sequence) is rather different.
> WHERE
> {
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> }
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
is a different query because FILTERS are group-wide, a group being what
is between {} (caveat a special case in OPTIONAL not occuring here).
Adding the extra {} means the FILTER is applied to
?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
alone.
Andy
Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?
Posted by Simon Helsen <sh...@ca.ibm.com>.
Andy,
I have simplified the scenario a little bit. So the following query
PREFIX p1:<t1>
PREFIX p2:<t2>
SELECT DISTINCT ?R1 ?optionalValue
WHERE
{
?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
OPTIONAL
{ ?R1 p2:pr2 ?optionalValue }
}
leads to
(prefix ((p1: <file:///C:/Temp/t1>)
(p2: <file:///C:/Temp/t2>))
(distinct
(project (?R1 ?optionalValue)
(filter (= ?R1 <https://host:9443/rm/resources/_r1>)
(conditional
(bgp (triple ?R1 p1:pr1 <
https://host:9443/jts/process/project-areas/p>))
(bgp (triple ?R1 p2:pr2 ?optionalValue)))))))
whereas the following query
PREFIX p1:<t1>
PREFIX p2:<t2>
SELECT DISTINCT ?R1 ?optionalValue
WHERE
{
{
?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
}
OPTIONAL
{ ?R1 p2:pr2 ?optionalValue }
}
leads to
(prefix ((p1: <file:///C:/Temp/t1>)
(p2: <file:///C:/Temp/t2>))
(distinct
(project (?R1 ?optionalValue)
(conditional
(assign ((?R1 <https://host:9443/rm/resources/_r1>))
(bgp (triple <https:/host:9443/rm/resources/_r1> p1:pr1 <
https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
(bgp (triple ?R1 p2:pr2 ?optionalValue))))))
So, looking at this in more detail, the former plan surprises me a bit
because it proposes to evaluate p1:pr1 as optional, even though I didn't
express that in the query. To answer the rest of the question, the intend
of the query is to bind ?R1 to a known fixed resource uri and also
retrieve any optional predicates with that resource if present (in the
example p2:pr2). The reason to use the filter expression is that in
general, there may be a number of resources, so the following kind of
query
PREFIX p1:<t1>
PREFIX p2:<t2>
SELECT DISTINCT ?R1 ?optionalValue
WHERE
{
?R1 p1:pr1 <https://perfjts.ibm.com:9443/jts/process/project-areas/p>
OPTIONAL { ?R1 p2:pr2 ?optionalValue }
FILTER ( ?R1 = <https://perfrrs.ibm.com:9443/rm/resources/_r1> || ?R1 = <
https://perfrrs.ibm.com:9443/rm/resources/_r2>)
}
would be a typical variation
Simon
From:
Andy Seaborne <an...@apache.org>
To:
dev@jena.apache.org
Date:
07/18/2012 10:47 AM
Subject:
Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?
Simon,
The work for this optimization is done in TransformFilterEquality.
Is there a simpler (= shorter) query that exhibits this behaviour? Does
it depend on the number of OPTIONALS?
Aside fro the report, is that structure intended with a top-level BGP at
the end of the query, a nested one after the FILTER and a BGp at the
start - or was a single BGP meant?
Andy
On 16/07/12 23:27, Simon Helsen wrote:
> Hi everyone,
>
> some of our clients were reporting a rather severe performance breakdown
> on top of 2.7.x. After further investigation, it turns out that they had
> queries whose optimized plan was suddenly very weak for the given
> repository layout and shape. Strangely enough, putting some curlies
around
> the right triple patterns was sufficient to push the optimizer to do the
> correct optimization. Below is the original query and its plan and the
> adjusted query and its plan.
>
> I have 2 questions:
>
> 1) it seems this behavior changed against 2.6.x. Is this a known issue,
> e.g. a change which was required to avert a bug?
> 2) it is not clear to me why the optimizer needs the curlies in order to
> do the right thing. I.e. why it cannot achieve the same in the original
> query
>
> thanks
>
> Simon
>
> Original query:
>
> PREFIX t1:<t1>
> PREFIX t2:<t2>
> PREFIX t3:<t3>
> PREFIX t4:<t4>
> PREFIX t5:<t5>
> SELECT DISTINCT ?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4
?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10
> WHERE
> { ?R1 t1:p0 <https://host/jts/process/project-areas/p1>
> FILTER ( ?R1 = <https://host/rm/resources/r1> )
> { ?R1 t1:p1 ?R1_v6 }
> OPTIONAL
> { ?R1 t2:p2 ?R1_v9 }
> OPTIONAL
> { ?R1 t2:p3 ?R1_v7 }
> OPTIONAL
> { ?R1 t2:p4 ?R1_v10 }
> OPTIONAL
> { ?R1 t2:p5 ?R1_v8 }
> OPTIONAL
> { ?R1 t3:p6 ?R1_v1 .
> ?R1_v1 t2:p5 ?R1_uv2
> }
> OPTIONAL
> { ?R1 t3:p7 ?R1_v2 }
> OPTIONAL
> { ?R1 t3:p8 ?R1_v5 }
> OPTIONAL
> { ?R1 t3:p9 ?R1_v4 }
> OPTIONAL
> { ?R1 t4:p10 ?R1_v3 .
> ?R1_v3 t5:p11 ?R1_uv1
> }
> ?R1 t5:p12 t3:Artifact .
> ?R1 t1:p0 ?R1_resourceContext
> }
>
> Original plan:
>
> (prefix ((t4: <file:///C:/Temp/t4>)
> (t5: <file:///C:/Temp/t5>)
> (t1: <file:///C:/Temp/t1>)
> (t2: <file:///C:/Temp/t2>)
> (t3: <file:///C:/Temp/t3>))
> (distinct
> (project (?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4
?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10)
> (filter (= ?R1 <https://host/rm/resources/r1>)
> (sequence
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (bgp
> (triple ?R1 t1:p0 <
> https://host/jts/process/project-areas/p1>)
> (triple ?R1 t1:p1 ?R1_v6)
> )
> (bgp (triple ?R1 t2:p2 ?R1_v9)))
> (bgp (triple ?R1 t2:p3 ?R1_v7)))
> (bgp (triple ?R1 t2:p4 ?R1_v10)))
> (bgp (triple ?R1 t2:p5 ?R1_v8)))
> (bgp
> (triple ?R1 t3:p6 ?R1_v1)
> (triple ?R1_v1 t2:p5 ?R1_uv2)
> ))
> (bgp (triple ?R1 t3:p7 ?R1_v2)))
> (bgp (triple ?R1 t3:p8 ?R1_v5)))
> (bgp (triple ?R1 t3:p9 ?R1_v4)))
> (bgp
> (triple ?R1 t4:p10 ?R1_v3)
> (triple ?R1_v3 t5:p11 ?R1_uv1)
> ))
> (bgp
> (triple ?R1 t5:p12 t3:Artifact)
> (triple ?R1 t1:p0 ?R1_resourceContext)
> ))))))
>
> Adjusted query (the first 2 constraints and its filter have been
> surrounded by curlies):
>
> PREFIX t1:<t1>
> PREFIX t2:<t2>
> PREFIX t3:<t3>
> PREFIX t4:<t4>
> PREFIX t5:<t5>
> SELECT DISTINCT ?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4
?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10
> WHERE
> {
> {
> ?R1 t1:p0 <https://host/jts/process/project-areas/p1>
> FILTER ( ?R1 = <https://host/rm/resources/r1> )
> { ?R1 t1:p1 ?R1_v6 }
> }
> OPTIONAL
> { ?R1 t2:p2 ?R1_v9 }
> OPTIONAL
> { ?R1 t2:p3 ?R1_v7 }
> OPTIONAL
> { ?R1 t2:p4 ?R1_v10 }
> OPTIONAL
> { ?R1 t2:p5 ?R1_v8 }
> OPTIONAL
> { ?R1 t3:p6 ?R1_v1 .
> ?R1_v1 t2:p5 ?R1_uv2
> }
> OPTIONAL
> { ?R1 t3:p7 ?R1_v2 }
> OPTIONAL
> { ?R1 t3:p8 ?R1_v5 }
> OPTIONAL
> { ?R1 t3:p9 ?R1_v4 }
> OPTIONAL
> { ?R1 t4:p10 ?R1_v3 .
> ?R1_v3 t5:p11 ?R1_uv1
> }
> ?R1 t5:p12 t3:Artifact .
> ?R1 t1:p0 ?R1_resourceContext
> }
>
> Adjusted plan:
>
> (prefix ((t4: <file:///C:/Temp/t4>)
> (t5: <file:///C:/Temp/t5>)
> (t1: <file:///C:/Temp/t1>)
> (t2: <file:///C:/Temp/t2>)
> (t3: <file:///C:/Temp/t3>))
> (distinct
> (project (?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4
?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10)
> (sequence
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (assign ((?R1 <https://host/rm/resources/r1
>))
> (bgp
> (triple <https://host/rm/resources/r1>
t1:p0
> <https://host/jts/process/project-areas/p1>)
> (triple <https://host/rm/resources/r1>
t1:p1
> ?R1_v6)
> ))
> (bgp (triple ?R1 t2:p2 ?R1_v9)))
> (bgp (triple ?R1 t2:p3 ?R1_v7)))
> (bgp (triple ?R1 t2:p4 ?R1_v10)))
> (bgp (triple ?R1 t2:p5 ?R1_v8)))
> (bgp
> (triple ?R1 t3:p6 ?R1_v1)
> (triple ?R1_v1 t2:p5 ?R1_uv2)
> ))
> (bgp (triple ?R1 t3:p7 ?R1_v2)))
> (bgp (triple ?R1 t3:p8 ?R1_v5)))
> (bgp (triple ?R1 t3:p9 ?R1_v4)))
> (bgp
> (triple ?R1 t4:p10 ?R1_v3)
> (triple ?R1_v3 t5:p11 ?R1_uv1)
> ))
> (bgp
> (triple ?R1 t5:p12 t3:Artifact)
> (triple ?R1 t1:p0 ?R1_resourceContext)
> )))))
>
> Note the assign ((?R1 <https://host/rm/resources/r1>), which makes the
> query scalable on a large repository
>
Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?
Posted by Andy Seaborne <an...@apache.org>.
Simon,
The work for this optimization is done in TransformFilterEquality.
Is there a simpler (= shorter) query that exhibits this behaviour? Does
it depend on the number of OPTIONALS?
Aside fro the report, is that structure intended with a top-level BGP at
the end of the query, a nested one after the FILTER and a BGp at the
start - or was a single BGP meant?
Andy
On 16/07/12 23:27, Simon Helsen wrote:
> Hi everyone,
>
> some of our clients were reporting a rather severe performance breakdown
> on top of 2.7.x. After further investigation, it turns out that they had
> queries whose optimized plan was suddenly very weak for the given
> repository layout and shape. Strangely enough, putting some curlies around
> the right triple patterns was sufficient to push the optimizer to do the
> correct optimization. Below is the original query and its plan and the
> adjusted query and its plan.
>
> I have 2 questions:
>
> 1) it seems this behavior changed against 2.6.x. Is this a known issue,
> e.g. a change which was required to avert a bug?
> 2) it is not clear to me why the optimizer needs the curlies in order to
> do the right thing. I.e. why it cannot achieve the same in the original
> query
>
> thanks
>
> Simon
>
> Original query:
>
> PREFIX t1:<t1>
> PREFIX t2:<t2>
> PREFIX t3:<t3>
> PREFIX t4:<t4>
> PREFIX t5:<t5>
> SELECT DISTINCT ?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10
> WHERE
> { ?R1 t1:p0 <https://host/jts/process/project-areas/p1>
> FILTER ( ?R1 = <https://host/rm/resources/r1> )
> { ?R1 t1:p1 ?R1_v6 }
> OPTIONAL
> { ?R1 t2:p2 ?R1_v9 }
> OPTIONAL
> { ?R1 t2:p3 ?R1_v7 }
> OPTIONAL
> { ?R1 t2:p4 ?R1_v10 }
> OPTIONAL
> { ?R1 t2:p5 ?R1_v8 }
> OPTIONAL
> { ?R1 t3:p6 ?R1_v1 .
> ?R1_v1 t2:p5 ?R1_uv2
> }
> OPTIONAL
> { ?R1 t3:p7 ?R1_v2 }
> OPTIONAL
> { ?R1 t3:p8 ?R1_v5 }
> OPTIONAL
> { ?R1 t3:p9 ?R1_v4 }
> OPTIONAL
> { ?R1 t4:p10 ?R1_v3 .
> ?R1_v3 t5:p11 ?R1_uv1
> }
> ?R1 t5:p12 t3:Artifact .
> ?R1 t1:p0 ?R1_resourceContext
> }
>
> Original plan:
>
> (prefix ((t4: <file:///C:/Temp/t4>)
> (t5: <file:///C:/Temp/t5>)
> (t1: <file:///C:/Temp/t1>)
> (t2: <file:///C:/Temp/t2>)
> (t3: <file:///C:/Temp/t3>))
> (distinct
> (project (?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10)
> (filter (= ?R1 <https://host/rm/resources/r1>)
> (sequence
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (bgp
> (triple ?R1 t1:p0 <
> https://host/jts/process/project-areas/p1>)
> (triple ?R1 t1:p1 ?R1_v6)
> )
> (bgp (triple ?R1 t2:p2 ?R1_v9)))
> (bgp (triple ?R1 t2:p3 ?R1_v7)))
> (bgp (triple ?R1 t2:p4 ?R1_v10)))
> (bgp (triple ?R1 t2:p5 ?R1_v8)))
> (bgp
> (triple ?R1 t3:p6 ?R1_v1)
> (triple ?R1_v1 t2:p5 ?R1_uv2)
> ))
> (bgp (triple ?R1 t3:p7 ?R1_v2)))
> (bgp (triple ?R1 t3:p8 ?R1_v5)))
> (bgp (triple ?R1 t3:p9 ?R1_v4)))
> (bgp
> (triple ?R1 t4:p10 ?R1_v3)
> (triple ?R1_v3 t5:p11 ?R1_uv1)
> ))
> (bgp
> (triple ?R1 t5:p12 t3:Artifact)
> (triple ?R1 t1:p0 ?R1_resourceContext)
> ))))))
>
> Adjusted query (the first 2 constraints and its filter have been
> surrounded by curlies):
>
> PREFIX t1:<t1>
> PREFIX t2:<t2>
> PREFIX t3:<t3>
> PREFIX t4:<t4>
> PREFIX t5:<t5>
> SELECT DISTINCT ?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10
> WHERE
> {
> {
> ?R1 t1:p0 <https://host/jts/process/project-areas/p1>
> FILTER ( ?R1 = <https://host/rm/resources/r1> )
> { ?R1 t1:p1 ?R1_v6 }
> }
> OPTIONAL
> { ?R1 t2:p2 ?R1_v9 }
> OPTIONAL
> { ?R1 t2:p3 ?R1_v7 }
> OPTIONAL
> { ?R1 t2:p4 ?R1_v10 }
> OPTIONAL
> { ?R1 t2:p5 ?R1_v8 }
> OPTIONAL
> { ?R1 t3:p6 ?R1_v1 .
> ?R1_v1 t2:p5 ?R1_uv2
> }
> OPTIONAL
> { ?R1 t3:p7 ?R1_v2 }
> OPTIONAL
> { ?R1 t3:p8 ?R1_v5 }
> OPTIONAL
> { ?R1 t3:p9 ?R1_v4 }
> OPTIONAL
> { ?R1 t4:p10 ?R1_v3 .
> ?R1_v3 t5:p11 ?R1_uv1
> }
> ?R1 t5:p12 t3:Artifact .
> ?R1 t1:p0 ?R1_resourceContext
> }
>
> Adjusted plan:
>
> (prefix ((t4: <file:///C:/Temp/t4>)
> (t5: <file:///C:/Temp/t5>)
> (t1: <file:///C:/Temp/t1>)
> (t2: <file:///C:/Temp/t2>)
> (t3: <file:///C:/Temp/t3>))
> (distinct
> (project (?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10)
> (sequence
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (conditional
> (assign ((?R1 <https://host/rm/resources/r1>))
> (bgp
> (triple <https://host/rm/resources/r1> t1:p0
> <https://host/jts/process/project-areas/p1>)
> (triple <https://host/rm/resources/r1> t1:p1
> ?R1_v6)
> ))
> (bgp (triple ?R1 t2:p2 ?R1_v9)))
> (bgp (triple ?R1 t2:p3 ?R1_v7)))
> (bgp (triple ?R1 t2:p4 ?R1_v10)))
> (bgp (triple ?R1 t2:p5 ?R1_v8)))
> (bgp
> (triple ?R1 t3:p6 ?R1_v1)
> (triple ?R1_v1 t2:p5 ?R1_uv2)
> ))
> (bgp (triple ?R1 t3:p7 ?R1_v2)))
> (bgp (triple ?R1 t3:p8 ?R1_v5)))
> (bgp (triple ?R1 t3:p9 ?R1_v4)))
> (bgp
> (triple ?R1 t4:p10 ?R1_v3)
> (triple ?R1_v3 t5:p11 ?R1_uv1)
> ))
> (bgp
> (triple ?R1 t5:p12 t3:Artifact)
> (triple ?R1 t1:p0 ?R1_resourceContext)
> )))))
>
> Note the assign ((?R1 <https://host/rm/resources/r1>), which makes the
> query scalable on a large repository
>