You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jena.apache.org by Simon Helsen <sh...@ca.ibm.com> on 2012/07/17 00:27:34 UTC

optimization opportunity lost in 2.7.x which existed in 2.6.x?

Hi everyone,

some of our clients were reporting a rather severe performance breakdown 
on top of 2.7.x. After further investigation, it turns out that they had 
queries whose optimized plan was suddenly very weak for the given 
repository layout and shape. Strangely enough, putting some curlies around 
the right triple patterns was sufficient to push the optimizer to do the 
correct optimization. Below is the original query and its plan and the 
adjusted query and its plan. 

I have 2 questions: 

1) it seems this behavior changed against 2.6.x. Is this a known issue, 
e.g. a change which was required to avert a bug?
2) it is not clear to me why the optimizer needs the curlies in order to 
do the right thing. I.e. why it cannot achieve the same in the original 
query

thanks

Simon

Original query:

PREFIX t1:<t1>
PREFIX t2:<t2>
PREFIX t3:<t3>
PREFIX t4:<t4>
PREFIX t5:<t5>
SELECT DISTINCT ?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5 
?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10
WHERE
{ ?R1 t1:p0 <https://host/jts/process/project-areas/p1>
FILTER ( ?R1 = <https://host/rm/resources/r1> )
{ ?R1 t1:p1 ?R1_v6 }
OPTIONAL
{ ?R1 t2:p2 ?R1_v9 }
OPTIONAL
{ ?R1 t2:p3 ?R1_v7 }
OPTIONAL
{ ?R1 t2:p4 ?R1_v10 }
OPTIONAL
{ ?R1 t2:p5 ?R1_v8 }
OPTIONAL
{ ?R1 t3:p6 ?R1_v1 .
?R1_v1 t2:p5 ?R1_uv2
}
OPTIONAL
{ ?R1 t3:p7 ?R1_v2 }
OPTIONAL
{ ?R1 t3:p8 ?R1_v5 }
OPTIONAL
{ ?R1 t3:p9 ?R1_v4 }
OPTIONAL
{ ?R1 t4:p10 ?R1_v3 .
?R1_v3 t5:p11 ?R1_uv1
}
?R1 t5:p12 t3:Artifact .
?R1 t1:p0 ?R1_resourceContext
}

Original plan:

(prefix ((t4: <file:///C:/Temp/t4>)
         (t5: <file:///C:/Temp/t5>)
         (t1: <file:///C:/Temp/t1>)
         (t2: <file:///C:/Temp/t2>)
         (t3: <file:///C:/Temp/t3>))
  (distinct
    (project (?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5 
?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10)
      (filter (= ?R1 <https://host/rm/resources/r1>)
        (sequence
          (conditional
            (conditional
              (conditional
                (conditional
                  (conditional
                    (conditional
                      (conditional
                        (conditional
                          (conditional
                            (bgp
                              (triple ?R1 t1:p0 <
https://host/jts/process/project-areas/p1>)
                              (triple ?R1 t1:p1 ?R1_v6)
                            )
                            (bgp (triple ?R1 t2:p2 ?R1_v9)))
                          (bgp (triple ?R1 t2:p3 ?R1_v7)))
                        (bgp (triple ?R1 t2:p4 ?R1_v10)))
                      (bgp (triple ?R1 t2:p5 ?R1_v8)))
                    (bgp
                      (triple ?R1 t3:p6 ?R1_v1)
                      (triple ?R1_v1 t2:p5 ?R1_uv2)
                    ))
                  (bgp (triple ?R1 t3:p7 ?R1_v2)))
                (bgp (triple ?R1 t3:p8 ?R1_v5)))
              (bgp (triple ?R1 t3:p9 ?R1_v4)))
            (bgp
              (triple ?R1 t4:p10 ?R1_v3)
              (triple ?R1_v3 t5:p11 ?R1_uv1)
            ))
          (bgp
            (triple ?R1 t5:p12 t3:Artifact)
            (triple ?R1 t1:p0 ?R1_resourceContext)
          ))))))

Adjusted query (the first 2 constraints and its filter have been 
surrounded by curlies):

PREFIX t1:<t1>
PREFIX t2:<t2>
PREFIX t3:<t3>
PREFIX t4:<t4>
PREFIX t5:<t5>
SELECT DISTINCT ?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5 
?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10
WHERE
{ 
{
?R1 t1:p0 <https://host/jts/process/project-areas/p1>
FILTER ( ?R1 = <https://host/rm/resources/r1> )
{ ?R1 t1:p1 ?R1_v6 }
}
OPTIONAL
{ ?R1 t2:p2 ?R1_v9 }
OPTIONAL
{ ?R1 t2:p3 ?R1_v7 }
OPTIONAL
{ ?R1 t2:p4 ?R1_v10 }
OPTIONAL
{ ?R1 t2:p5 ?R1_v8 }
OPTIONAL
{ ?R1 t3:p6 ?R1_v1 .
?R1_v1 t2:p5 ?R1_uv2
}
OPTIONAL
{ ?R1 t3:p7 ?R1_v2 }
OPTIONAL
{ ?R1 t3:p8 ?R1_v5 }
OPTIONAL
{ ?R1 t3:p9 ?R1_v4 }
OPTIONAL
{ ?R1 t4:p10 ?R1_v3 .
?R1_v3 t5:p11 ?R1_uv1
}
?R1 t5:p12 t3:Artifact .
?R1 t1:p0 ?R1_resourceContext
}

Adjusted plan:

(prefix ((t4: <file:///C:/Temp/t4>)
         (t5: <file:///C:/Temp/t5>)
         (t1: <file:///C:/Temp/t1>)
         (t2: <file:///C:/Temp/t2>)
         (t3: <file:///C:/Temp/t3>))
  (distinct
    (project (?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5 
?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10)
      (sequence
        (conditional
          (conditional
            (conditional
              (conditional
                (conditional
                  (conditional
                    (conditional
                      (conditional
                        (conditional
                          (assign ((?R1 <https://host/rm/resources/r1>))
                            (bgp
                              (triple <https://host/rm/resources/r1> t1:p0 
<https://host/jts/process/project-areas/p1>)
                              (triple <https://host/rm/resources/r1> t1:p1 
?R1_v6)
                            ))
                          (bgp (triple ?R1 t2:p2 ?R1_v9)))
                        (bgp (triple ?R1 t2:p3 ?R1_v7)))
                      (bgp (triple ?R1 t2:p4 ?R1_v10)))
                    (bgp (triple ?R1 t2:p5 ?R1_v8)))
                  (bgp
                    (triple ?R1 t3:p6 ?R1_v1)
                    (triple ?R1_v1 t2:p5 ?R1_uv2)
                  ))
                (bgp (triple ?R1 t3:p7 ?R1_v2)))
              (bgp (triple ?R1 t3:p8 ?R1_v5)))
            (bgp (triple ?R1 t3:p9 ?R1_v4)))
          (bgp
            (triple ?R1 t4:p10 ?R1_v3)
            (triple ?R1_v3 t5:p11 ?R1_uv1)
          ))
        (bgp
          (triple ?R1 t5:p12 t3:Artifact)
          (triple ?R1 t1:p0 ?R1_resourceContext)
        )))))

Note the assign ((?R1 <https://host/rm/resources/r1>), which makes the 
query scalable on a large repository

Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?

Posted by Simon Helsen <sh...@ca.ibm.com>.
Andy, 

ok, so the group insertion is a different query, but even if that is the 
case, why can the FILTER assignment not take place in all groups? 

Note that in 2.6.x, something along these lines did happen (I don't have 
the plan around, but we because of our performance numbers)

Simon




From:
Andy Seaborne <an...@apache.org>
To:
Simon Helsen/Toronto/IBM@IBMCA
Cc:
dev@jena.apache.org
Date:
07/18/2012 05:42 PM
Subject:
Re: optimization opportunity lost in 2.7.x  which existed in 2.6.x?



On 18/07/12 16:25, Simon Helsen wrote:
> Andy,
>
> I have simplified the scenario a little bit. So the following query
>
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
>
> leads to
>
> (prefix ((p1: <file:///C:/Temp/t1>)
>           (p2: <file:///C:/Temp/t2>))
>    (distinct
>      (project (?R1 ?optionalValue)
>        (filter (= ?R1 <https://host:9443/rm/resources/_r1>)
>          (conditional
>            (bgp (triple ?R1 p1:pr1
> <https://host:9443/jts/process/project-areas/p>))
>            (bgp (triple ?R1 p2:pr2 ?optionalValue)))))))
>
> whereas the following query
>
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> }
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
>
> leads to
>
> (prefix ((p1: <file:///C:/Temp/t1>)
>           (p2: <file:///C:/Temp/t2>))
>    (distinct
>      (project (?R1 ?optionalValue)
>        (conditional
>          (assign ((?R1 <https://host:9443/rm/resources/_r1>))
>            (bgp (triple <https:/host:9443/rm/resources/_r1> p1:pr1
> <https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
>          (bgp (triple ?R1 p2:pr2 ?optionalValue))))))
>
> So, looking at this in more detail, the former plan surprises me a bit
> because it proposes to evaluate p1:pr1 as optional, even though I didn't
> express that in the query.

Yes, you did.

"conditional" is a binary operator, an optimized for of leftjoin.  The 
first arg is the fixed side and the second args the conditional part.


{ ?s ?p ?o OPTIONAL { ?s1 ?p1 ?o1 } }

becomes

(leftjoin
    (bgp (triple ?s ?p ?o)
    (bgp (triple ?s1 ?p1 ?o1))

which is executed as

(conditional
    (bgp (triple ?s ?p ?o)
    (bgp (triple ?s1 ?p1 ?o1))

(sequence) is rather different.

> WHERE
> {
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> }
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }

is a different query because FILTERS are group-wide, a group being what 
is between {} (caveat a special case in OPTIONAL not occuring here).

Adding the extra {} means the FILTER is applied to

  ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>

alone.

                 Andy




Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?

Posted by Andy Seaborne <an...@apache.org>.
On 19/07/12 01:52, Simon Helsen wrote:
> Sorry, I am looking at this again. Are you saying that in
>
> (prefix ((p1: <file:///C:/Temp/t1>)
>           (p2: <file:///C:/Temp/t2>))
>    (distinct
>      (project (?R1 ?optionalValue)
>        (conditional
>          (assign ((?R1 <https://perfrrs.ibm.com:9443/rm/resources/_r1>))
>            (bgp (triple <https://perfrrs.ibm.com:9443/rm/resources/_r1>
> p1:pr1 <https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
>          (bgp (triple ?R1 p2:pr2 ?optionalValue))))))
>
> ?R1 is not always bound to
> <https://perfrrs.ibm.com:9443/rm/resources/_r1>?

It is bound - (conditional) flows the bindings from the left into the 
right.  It's a left-index-join with no scope restrictions.

(conditional) and (sequence) are used when there are no scope issues 
arise (e.g. doubly nested optionals with no use of the var in the 
middele optiona, use of FILTER variables in out-of-scope places)

> And therefore, are you
> saying that ?R1 can be any value which satisfies p2:pr2 ?optionalValue ?
>
> I still don't understand why the assign optimization cannot be pushed
> into the optional in the query

I don't know why the optimization is taking place.  I would have 
expected it to occur in the shorter example at least and the longer one 
looks like the same structure (when you indent the thing - the 
sparql.org query validator pretty prints queries - its a web service 
around the engine of arq.qparse).  But I haven't looked at the code 
in-depth yet to see if some corner case is blocking it or whether it is 
something to do with a change in the order optimization strategies are 
applied.

	Andy

> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> ?R1 p1:pr1 <https://perfjts.ibm.com:9443/jts/process/project-areas/p>
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue  }
> FILTER ( ?R1 = <https://perfrrs.ibm.com:9443/rm/resources/_r1>)
> }
>
> is there perhaps a different way to express this?
>
> Simon
>
>
> From: 	Andy Seaborne <an...@apache.org>
> To: 	Simon Helsen/Toronto/IBM@IBMCA
> Cc: 	dev@jena.apache.org
> Date: 	07/18/2012 05:42 PM
> Subject: 	Re: optimization opportunity lost in 2.7.x  which existed in
> 2.6.x?
>
>
> ------------------------------------------------------------------------
>
>
>
> On 18/07/12 16:25, Simon Helsen wrote:
>  > Andy,
>  >
>  > I have simplified the scenario a little bit. So the following query
>  >
>  > PREFIX p1:<t1>
>  > PREFIX p2:<t2>
>  > SELECT DISTINCT ?R1 ?optionalValue
>  > WHERE
>  > {
>  > ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
>  > FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
>  > OPTIONAL
>  > { ?R1 p2:pr2 ?optionalValue }
>  > }
>  >
>  > leads to
>  >
>  > (prefix ((p1: <file:///C:/Temp/t1>)
>  >           (p2: <file:///C:/Temp/t2>))
>  >    (distinct
>  >      (project (?R1 ?optionalValue)
>  >        (filter (= ?R1 <https://host:9443/rm/resources/_r1>)
>  >          (conditional
>  >            (bgp (triple ?R1 p1:pr1
>  > <https://host:9443/jts/process/project-areas/p>))
>  >            (bgp (triple ?R1 p2:pr2 ?optionalValue)))))))
>  >
>  > whereas the following query
>  >
>  > PREFIX p1:<t1>
>  > PREFIX p2:<t2>
>  > SELECT DISTINCT ?R1 ?optionalValue
>  > WHERE
>  > {
>  > {
>  > ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
>  > FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
>  > }
>  > OPTIONAL
>  > { ?R1 p2:pr2 ?optionalValue }
>  > }
>  >
>  > leads to
>  >
>  > (prefix ((p1: <file:///C:/Temp/t1>)
>  >           (p2: <file:///C:/Temp/t2>))
>  >    (distinct
>  >      (project (?R1 ?optionalValue)
>  >        (conditional
>  >          (assign ((?R1 <https://host:9443/rm/resources/_r1>))
>  >            (bgp (triple <https:/host:9443/rm/resources/_r1> p1:pr1
>  > <https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
>  >          (bgp (triple ?R1 p2:pr2 ?optionalValue))))))
>  >
>  > So, looking at this in more detail, the former plan surprises me a bit
>  > because it proposes to evaluate p1:pr1 as optional, even though I didn't
>  > express that in the query.
>
> Yes, you did.
>
> "conditional" is a binary operator, an optimized for of leftjoin.  The
> first arg is the fixed side and the second args the conditional part.
>
>
> { ?s ?p ?o OPTIONAL { ?s1 ?p1 ?o1 } }
>
> becomes
>
> (leftjoin
>     (bgp (triple ?s ?p ?o)
>     (bgp (triple ?s1 ?p1 ?o1))
>
> which is executed as
>
> (conditional
>     (bgp (triple ?s ?p ?o)
>     (bgp (triple ?s1 ?p1 ?o1))
>
> (sequence) is rather different.
>
>  > WHERE
>  > {
>  > {
>  > ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
>  > FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
>  > }
>  > OPTIONAL
>  > { ?R1 p2:pr2 ?optionalValue }
>  > }
>
> is a different query because FILTERS are group-wide, a group being what
> is between {} (caveat a special case in OPTIONAL not occuring here).
>
> Adding the extra {} means the FILTER is applied to
>
>   ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
>
> alone.
>
> Andy
>
>
>



Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?

Posted by Simon Helsen <sh...@ca.ibm.com>.
Sorry, I am looking at this again. Are you saying that in

(prefix ((p1: <file:///C:/Temp/t1>)
         (p2: <file:///C:/Temp/t2>))
  (distinct
    (project (?R1 ?optionalValue)
      (conditional
        (assign ((?R1 <https://perfrrs.ibm.com:9443/rm/resources/_r1>))
          (bgp (triple <https://perfrrs.ibm.com:9443/rm/resources/_r1> 
p1:pr1 <https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
        (bgp (triple ?R1 p2:pr2 ?optionalValue))))))

?R1 is not always bound to <https://perfrrs.ibm.com:9443/rm/resources/_r1
>? And therefore, are you saying that ?R1 can be any value which satisfies 
p2:pr2 ?optionalValue ?

I still don't understand why the assign optimization cannot be pushed into 
the optional in the query 

PREFIX p1:<t1>
PREFIX p2:<t2>
SELECT DISTINCT ?R1 ?optionalValue
WHERE
{ 
?R1 p1:pr1 <https://perfjts.ibm.com:9443/jts/process/project-areas/p>
OPTIONAL
{ ?R1 p2:pr2 ?optionalValue  }
FILTER ( ?R1 = <https://perfrrs.ibm.com:9443/rm/resources/_r1>) 
}

is there perhaps a different way to express this? 

Simon



From:
Andy Seaborne <an...@apache.org>
To:
Simon Helsen/Toronto/IBM@IBMCA
Cc:
dev@jena.apache.org
Date:
07/18/2012 05:42 PM
Subject:
Re: optimization opportunity lost in 2.7.x  which existed in 2.6.x?



On 18/07/12 16:25, Simon Helsen wrote:
> Andy,
>
> I have simplified the scenario a little bit. So the following query
>
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
>
> leads to
>
> (prefix ((p1: <file:///C:/Temp/t1>)
>           (p2: <file:///C:/Temp/t2>))
>    (distinct
>      (project (?R1 ?optionalValue)
>        (filter (= ?R1 <https://host:9443/rm/resources/_r1>)
>          (conditional
>            (bgp (triple ?R1 p1:pr1
> <https://host:9443/jts/process/project-areas/p>))
>            (bgp (triple ?R1 p2:pr2 ?optionalValue)))))))
>
> whereas the following query
>
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> }
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
>
> leads to
>
> (prefix ((p1: <file:///C:/Temp/t1>)
>           (p2: <file:///C:/Temp/t2>))
>    (distinct
>      (project (?R1 ?optionalValue)
>        (conditional
>          (assign ((?R1 <https://host:9443/rm/resources/_r1>))
>            (bgp (triple <https:/host:9443/rm/resources/_r1> p1:pr1
> <https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
>          (bgp (triple ?R1 p2:pr2 ?optionalValue))))))
>
> So, looking at this in more detail, the former plan surprises me a bit
> because it proposes to evaluate p1:pr1 as optional, even though I didn't
> express that in the query.

Yes, you did.

"conditional" is a binary operator, an optimized for of leftjoin.  The 
first arg is the fixed side and the second args the conditional part.


{ ?s ?p ?o OPTIONAL { ?s1 ?p1 ?o1 } }

becomes

(leftjoin
    (bgp (triple ?s ?p ?o)
    (bgp (triple ?s1 ?p1 ?o1))

which is executed as

(conditional
    (bgp (triple ?s ?p ?o)
    (bgp (triple ?s1 ?p1 ?o1))

(sequence) is rather different.

> WHERE
> {
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> }
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }

is a different query because FILTERS are group-wide, a group being what 
is between {} (caveat a special case in OPTIONAL not occuring here).

Adding the extra {} means the FILTER is applied to

  ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>

alone.

                 Andy




Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?

Posted by Andy Seaborne <an...@apache.org>.
On 18/07/12 16:25, Simon Helsen wrote:
> Andy,
>
> I have simplified the scenario a little bit. So the following query
>
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
>
> leads to
>
> (prefix ((p1: <file:///C:/Temp/t1>)
>           (p2: <file:///C:/Temp/t2>))
>    (distinct
>      (project (?R1 ?optionalValue)
>        (filter (= ?R1 <https://host:9443/rm/resources/_r1>)
>          (conditional
>            (bgp (triple ?R1 p1:pr1
> <https://host:9443/jts/process/project-areas/p>))
>            (bgp (triple ?R1 p2:pr2 ?optionalValue)))))))
>
> whereas the following query
>
> PREFIX p1:<t1>
> PREFIX p2:<t2>
> SELECT DISTINCT ?R1 ?optionalValue
> WHERE
> {
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> }
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }
>
> leads to
>
> (prefix ((p1: <file:///C:/Temp/t1>)
>           (p2: <file:///C:/Temp/t2>))
>    (distinct
>      (project (?R1 ?optionalValue)
>        (conditional
>          (assign ((?R1 <https://host:9443/rm/resources/_r1>))
>            (bgp (triple <https:/host:9443/rm/resources/_r1> p1:pr1
> <https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
>          (bgp (triple ?R1 p2:pr2 ?optionalValue))))))
>
> So, looking at this in more detail, the former plan surprises me a bit
> because it proposes to evaluate p1:pr1 as optional, even though I didn't
> express that in the query.

Yes, you did.

"conditional" is a binary operator, an optimized for of leftjoin.  The 
first arg is the fixed side and the second args the conditional part.


{ ?s ?p ?o OPTIONAL { ?s1 ?p1 ?o1 } }

becomes

(leftjoin
    (bgp (triple ?s ?p ?o)
    (bgp (triple ?s1 ?p1 ?o1))

which is executed as

(conditional
    (bgp (triple ?s ?p ?o)
    (bgp (triple ?s1 ?p1 ?o1))

(sequence) is rather different.

> WHERE
> {
> {
> ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
> FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
> }
> OPTIONAL
> { ?R1 p2:pr2 ?optionalValue }
> }

is a different query because FILTERS are group-wide, a group being what 
is between {} (caveat a special case in OPTIONAL not occuring here).

Adding the extra {} means the FILTER is applied to

  ?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>

alone.

	Andy

Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?

Posted by Simon Helsen <sh...@ca.ibm.com>.
Andy,

I have simplified the scenario a little bit. So the following query

PREFIX p1:<t1>
PREFIX p2:<t2>
SELECT DISTINCT ?R1 ?optionalValue
WHERE
{ 
?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
OPTIONAL
{ ?R1 p2:pr2 ?optionalValue }
}

leads to

(prefix ((p1: <file:///C:/Temp/t1>)
         (p2: <file:///C:/Temp/t2>))
  (distinct
    (project (?R1 ?optionalValue)
      (filter (= ?R1 <https://host:9443/rm/resources/_r1>)
        (conditional
          (bgp (triple ?R1 p1:pr1 <
https://host:9443/jts/process/project-areas/p>))
          (bgp (triple ?R1 p2:pr2 ?optionalValue)))))))

whereas the following query 

PREFIX p1:<t1>
PREFIX p2:<t2>
SELECT DISTINCT ?R1 ?optionalValue
WHERE
{ 
{
?R1 p1:pr1 <https://host:9443/jts/process/project-areas/p>
FILTER ( ?R1 = <https://host:9443/rm/resources/_r1>)
}
OPTIONAL
{ ?R1 p2:pr2 ?optionalValue }
}

leads to

(prefix ((p1: <file:///C:/Temp/t1>)
         (p2: <file:///C:/Temp/t2>))
  (distinct
    (project (?R1 ?optionalValue)
      (conditional
        (assign ((?R1 <https://host:9443/rm/resources/_r1>))
          (bgp (triple <https:/host:9443/rm/resources/_r1> p1:pr1 <
https://perfjts.ibm.com:9443/jts/process/project-areas/p>)))
        (bgp (triple ?R1 p2:pr2 ?optionalValue))))))

So, looking at this in more detail, the former plan surprises me a bit 
because it proposes to evaluate p1:pr1 as optional, even though I didn't 
express that in the query. To answer the rest of the question, the intend 
of the query is to bind ?R1 to a known fixed resource uri and also 
retrieve any optional predicates with that resource if present (in the 
example p2:pr2). The reason to use the filter expression is that in 
general, there may be a number of resources, so the following kind of 
query

PREFIX p1:<t1>
PREFIX p2:<t2>
SELECT DISTINCT ?R1 ?optionalValue
WHERE
{ 
?R1 p1:pr1 <https://perfjts.ibm.com:9443/jts/process/project-areas/p>
OPTIONAL { ?R1 p2:pr2 ?optionalValue }
FILTER ( ?R1 = <https://perfrrs.ibm.com:9443/rm/resources/_r1> || ?R1 = <
https://perfrrs.ibm.com:9443/rm/resources/_r2>)
}

would be a typical variation

Simon




From:
Andy Seaborne <an...@apache.org>
To:
dev@jena.apache.org
Date:
07/18/2012 10:47 AM
Subject:
Re: optimization opportunity lost in 2.7.x  which existed in 2.6.x?



Simon,

The work for this optimization is done in TransformFilterEquality.

Is there a simpler (= shorter) query that exhibits this behaviour?  Does 
it depend on the number of OPTIONALS?

Aside fro the report, is that structure intended with a top-level BGP at 
the end of the query, a nested one after the FILTER and a BGp at the 
start - or was a single BGP meant?

                 Andy


On 16/07/12 23:27, Simon Helsen wrote:
> Hi everyone,
>
> some of our clients were reporting a rather severe performance breakdown
> on top of 2.7.x. After further investigation, it turns out that they had
> queries whose optimized plan was suddenly very weak for the given
> repository layout and shape. Strangely enough, putting some curlies 
around
> the right triple patterns was sufficient to push the optimizer to do the
> correct optimization. Below is the original query and its plan and the
> adjusted query and its plan.
>
> I have 2 questions:
>
> 1) it seems this behavior changed against 2.6.x. Is this a known issue,
> e.g. a change which was required to avert a bug?
> 2) it is not clear to me why the optimizer needs the curlies in order to
> do the right thing. I.e. why it cannot achieve the same in the original
> query
>
> thanks
>
> Simon
>
> Original query:
>
> PREFIX t1:<t1>
> PREFIX t2:<t2>
> PREFIX t3:<t3>
> PREFIX t4:<t4>
> PREFIX t5:<t5>
> SELECT DISTINCT ?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 
?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10
> WHERE
> { ?R1 t1:p0 <https://host/jts/process/project-areas/p1>
> FILTER ( ?R1 = <https://host/rm/resources/r1> )
> { ?R1 t1:p1 ?R1_v6 }
> OPTIONAL
> { ?R1 t2:p2 ?R1_v9 }
> OPTIONAL
> { ?R1 t2:p3 ?R1_v7 }
> OPTIONAL
> { ?R1 t2:p4 ?R1_v10 }
> OPTIONAL
> { ?R1 t2:p5 ?R1_v8 }
> OPTIONAL
> { ?R1 t3:p6 ?R1_v1 .
> ?R1_v1 t2:p5 ?R1_uv2
> }
> OPTIONAL
> { ?R1 t3:p7 ?R1_v2 }
> OPTIONAL
> { ?R1 t3:p8 ?R1_v5 }
> OPTIONAL
> { ?R1 t3:p9 ?R1_v4 }
> OPTIONAL
> { ?R1 t4:p10 ?R1_v3 .
> ?R1_v3 t5:p11 ?R1_uv1
> }
> ?R1 t5:p12 t3:Artifact .
> ?R1 t1:p0 ?R1_resourceContext
> }
>
> Original plan:
>
> (prefix ((t4: <file:///C:/Temp/t4>)
>           (t5: <file:///C:/Temp/t5>)
>           (t1: <file:///C:/Temp/t1>)
>           (t2: <file:///C:/Temp/t2>)
>           (t3: <file:///C:/Temp/t3>))
>    (distinct
>      (project (?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 
?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10)
>        (filter (= ?R1 <https://host/rm/resources/r1>)
>          (sequence
>            (conditional
>              (conditional
>                (conditional
>                  (conditional
>                    (conditional
>                      (conditional
>                        (conditional
>                          (conditional
>                            (conditional
>                              (bgp
>                                (triple ?R1 t1:p0 <
> https://host/jts/process/project-areas/p1>)
>                                (triple ?R1 t1:p1 ?R1_v6)
>                              )
>                              (bgp (triple ?R1 t2:p2 ?R1_v9)))
>                            (bgp (triple ?R1 t2:p3 ?R1_v7)))
>                          (bgp (triple ?R1 t2:p4 ?R1_v10)))
>                        (bgp (triple ?R1 t2:p5 ?R1_v8)))
>                      (bgp
>                        (triple ?R1 t3:p6 ?R1_v1)
>                        (triple ?R1_v1 t2:p5 ?R1_uv2)
>                      ))
>                    (bgp (triple ?R1 t3:p7 ?R1_v2)))
>                  (bgp (triple ?R1 t3:p8 ?R1_v5)))
>                (bgp (triple ?R1 t3:p9 ?R1_v4)))
>              (bgp
>                (triple ?R1 t4:p10 ?R1_v3)
>                (triple ?R1_v3 t5:p11 ?R1_uv1)
>              ))
>            (bgp
>              (triple ?R1 t5:p12 t3:Artifact)
>              (triple ?R1 t1:p0 ?R1_resourceContext)
>            ))))))
>
> Adjusted query (the first 2 constraints and its filter have been
> surrounded by curlies):
>
> PREFIX t1:<t1>
> PREFIX t2:<t2>
> PREFIX t3:<t3>
> PREFIX t4:<t4>
> PREFIX t5:<t5>
> SELECT DISTINCT ?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 
?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10
> WHERE
> {
> {
> ?R1 t1:p0 <https://host/jts/process/project-areas/p1>
> FILTER ( ?R1 = <https://host/rm/resources/r1> )
> { ?R1 t1:p1 ?R1_v6 }
> }
> OPTIONAL
> { ?R1 t2:p2 ?R1_v9 }
> OPTIONAL
> { ?R1 t2:p3 ?R1_v7 }
> OPTIONAL
> { ?R1 t2:p4 ?R1_v10 }
> OPTIONAL
> { ?R1 t2:p5 ?R1_v8 }
> OPTIONAL
> { ?R1 t3:p6 ?R1_v1 .
> ?R1_v1 t2:p5 ?R1_uv2
> }
> OPTIONAL
> { ?R1 t3:p7 ?R1_v2 }
> OPTIONAL
> { ?R1 t3:p8 ?R1_v5 }
> OPTIONAL
> { ?R1 t3:p9 ?R1_v4 }
> OPTIONAL
> { ?R1 t4:p10 ?R1_v3 .
> ?R1_v3 t5:p11 ?R1_uv1
> }
> ?R1 t5:p12 t3:Artifact .
> ?R1 t1:p0 ?R1_resourceContext
> }
>
> Adjusted plan:
>
> (prefix ((t4: <file:///C:/Temp/t4>)
>           (t5: <file:///C:/Temp/t5>)
>           (t1: <file:///C:/Temp/t1>)
>           (t2: <file:///C:/Temp/t2>)
>           (t3: <file:///C:/Temp/t3>))
>    (distinct
>      (project (?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 
?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10)
>        (sequence
>          (conditional
>            (conditional
>              (conditional
>                (conditional
>                  (conditional
>                    (conditional
>                      (conditional
>                        (conditional
>                          (conditional
>                            (assign ((?R1 <https://host/rm/resources/r1
>))
>                              (bgp
>                                (triple <https://host/rm/resources/r1> 
t1:p0
> <https://host/jts/process/project-areas/p1>)
>                                (triple <https://host/rm/resources/r1> 
t1:p1
> ?R1_v6)
>                              ))
>                            (bgp (triple ?R1 t2:p2 ?R1_v9)))
>                          (bgp (triple ?R1 t2:p3 ?R1_v7)))
>                        (bgp (triple ?R1 t2:p4 ?R1_v10)))
>                      (bgp (triple ?R1 t2:p5 ?R1_v8)))
>                    (bgp
>                      (triple ?R1 t3:p6 ?R1_v1)
>                      (triple ?R1_v1 t2:p5 ?R1_uv2)
>                    ))
>                  (bgp (triple ?R1 t3:p7 ?R1_v2)))
>                (bgp (triple ?R1 t3:p8 ?R1_v5)))
>              (bgp (triple ?R1 t3:p9 ?R1_v4)))
>            (bgp
>              (triple ?R1 t4:p10 ?R1_v3)
>              (triple ?R1_v3 t5:p11 ?R1_uv1)
>            ))
>          (bgp
>            (triple ?R1 t5:p12 t3:Artifact)
>            (triple ?R1 t1:p0 ?R1_resourceContext)
>          )))))
>
> Note the assign ((?R1 <https://host/rm/resources/r1>), which makes the
> query scalable on a large repository
>




Re: optimization opportunity lost in 2.7.x which existed in 2.6.x?

Posted by Andy Seaborne <an...@apache.org>.
Simon,

The work for this optimization is done in TransformFilterEquality.

Is there a simpler (= shorter) query that exhibits this behaviour?  Does 
it depend on the number of OPTIONALS?

Aside fro the report, is that structure intended with a top-level BGP at 
the end of the query, a nested one after the FILTER and a BGp at the 
start - or was a single BGP meant?

	Andy


On 16/07/12 23:27, Simon Helsen wrote:
> Hi everyone,
>
> some of our clients were reporting a rather severe performance breakdown
> on top of 2.7.x. After further investigation, it turns out that they had
> queries whose optimized plan was suddenly very weak for the given
> repository layout and shape. Strangely enough, putting some curlies around
> the right triple patterns was sufficient to push the optimizer to do the
> correct optimization. Below is the original query and its plan and the
> adjusted query and its plan.
>
> I have 2 questions:
>
> 1) it seems this behavior changed against 2.6.x. Is this a known issue,
> e.g. a change which was required to avert a bug?
> 2) it is not clear to me why the optimizer needs the curlies in order to
> do the right thing. I.e. why it cannot achieve the same in the original
> query
>
> thanks
>
> Simon
>
> Original query:
>
> PREFIX t1:<t1>
> PREFIX t2:<t2>
> PREFIX t3:<t3>
> PREFIX t4:<t4>
> PREFIX t5:<t5>
> SELECT DISTINCT ?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10
> WHERE
> { ?R1 t1:p0 <https://host/jts/process/project-areas/p1>
> FILTER ( ?R1 = <https://host/rm/resources/r1> )
> { ?R1 t1:p1 ?R1_v6 }
> OPTIONAL
> { ?R1 t2:p2 ?R1_v9 }
> OPTIONAL
> { ?R1 t2:p3 ?R1_v7 }
> OPTIONAL
> { ?R1 t2:p4 ?R1_v10 }
> OPTIONAL
> { ?R1 t2:p5 ?R1_v8 }
> OPTIONAL
> { ?R1 t3:p6 ?R1_v1 .
> ?R1_v1 t2:p5 ?R1_uv2
> }
> OPTIONAL
> { ?R1 t3:p7 ?R1_v2 }
> OPTIONAL
> { ?R1 t3:p8 ?R1_v5 }
> OPTIONAL
> { ?R1 t3:p9 ?R1_v4 }
> OPTIONAL
> { ?R1 t4:p10 ?R1_v3 .
> ?R1_v3 t5:p11 ?R1_uv1
> }
> ?R1 t5:p12 t3:Artifact .
> ?R1 t1:p0 ?R1_resourceContext
> }
>
> Original plan:
>
> (prefix ((t4: <file:///C:/Temp/t4>)
>           (t5: <file:///C:/Temp/t5>)
>           (t1: <file:///C:/Temp/t1>)
>           (t2: <file:///C:/Temp/t2>)
>           (t3: <file:///C:/Temp/t3>))
>    (distinct
>      (project (?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10)
>        (filter (= ?R1 <https://host/rm/resources/r1>)
>          (sequence
>            (conditional
>              (conditional
>                (conditional
>                  (conditional
>                    (conditional
>                      (conditional
>                        (conditional
>                          (conditional
>                            (conditional
>                              (bgp
>                                (triple ?R1 t1:p0 <
> https://host/jts/process/project-areas/p1>)
>                                (triple ?R1 t1:p1 ?R1_v6)
>                              )
>                              (bgp (triple ?R1 t2:p2 ?R1_v9)))
>                            (bgp (triple ?R1 t2:p3 ?R1_v7)))
>                          (bgp (triple ?R1 t2:p4 ?R1_v10)))
>                        (bgp (triple ?R1 t2:p5 ?R1_v8)))
>                      (bgp
>                        (triple ?R1 t3:p6 ?R1_v1)
>                        (triple ?R1_v1 t2:p5 ?R1_uv2)
>                      ))
>                    (bgp (triple ?R1 t3:p7 ?R1_v2)))
>                  (bgp (triple ?R1 t3:p8 ?R1_v5)))
>                (bgp (triple ?R1 t3:p9 ?R1_v4)))
>              (bgp
>                (triple ?R1 t4:p10 ?R1_v3)
>                (triple ?R1_v3 t5:p11 ?R1_uv1)
>              ))
>            (bgp
>              (triple ?R1 t5:p12 t3:Artifact)
>              (triple ?R1 t1:p0 ?R1_resourceContext)
>            ))))))
>
> Adjusted query (the first 2 constraints and its filter have been
> surrounded by curlies):
>
> PREFIX t1:<t1>
> PREFIX t2:<t2>
> PREFIX t3:<t3>
> PREFIX t4:<t4>
> PREFIX t5:<t5>
> SELECT DISTINCT ?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10
> WHERE
> {
> {
> ?R1 t1:p0 <https://host/jts/process/project-areas/p1>
> FILTER ( ?R1 = <https://host/rm/resources/r1> )
> { ?R1 t1:p1 ?R1_v6 }
> }
> OPTIONAL
> { ?R1 t2:p2 ?R1_v9 }
> OPTIONAL
> { ?R1 t2:p3 ?R1_v7 }
> OPTIONAL
> { ?R1 t2:p4 ?R1_v10 }
> OPTIONAL
> { ?R1 t2:p5 ?R1_v8 }
> OPTIONAL
> { ?R1 t3:p6 ?R1_v1 .
> ?R1_v1 t2:p5 ?R1_uv2
> }
> OPTIONAL
> { ?R1 t3:p7 ?R1_v2 }
> OPTIONAL
> { ?R1 t3:p8 ?R1_v5 }
> OPTIONAL
> { ?R1 t3:p9 ?R1_v4 }
> OPTIONAL
> { ?R1 t4:p10 ?R1_v3 .
> ?R1_v3 t5:p11 ?R1_uv1
> }
> ?R1 t5:p12 t3:Artifact .
> ?R1 t1:p0 ?R1_resourceContext
> }
>
> Adjusted plan:
>
> (prefix ((t4: <file:///C:/Temp/t4>)
>           (t5: <file:///C:/Temp/t5>)
>           (t1: <file:///C:/Temp/t1>)
>           (t2: <file:///C:/Temp/t2>)
>           (t3: <file:///C:/Temp/t3>))
>    (distinct
>      (project (?R1 ?R1_resourceContext ?R1_v1 ?R1_v2 ?R1_v3 ?R1_v4 ?R1_v5
> ?R1_v6 ?R1_v7 ?R1_v8 ?R1_v9 ?R1_v10)
>        (sequence
>          (conditional
>            (conditional
>              (conditional
>                (conditional
>                  (conditional
>                    (conditional
>                      (conditional
>                        (conditional
>                          (conditional
>                            (assign ((?R1 <https://host/rm/resources/r1>))
>                              (bgp
>                                (triple <https://host/rm/resources/r1> t1:p0
> <https://host/jts/process/project-areas/p1>)
>                                (triple <https://host/rm/resources/r1> t1:p1
> ?R1_v6)
>                              ))
>                            (bgp (triple ?R1 t2:p2 ?R1_v9)))
>                          (bgp (triple ?R1 t2:p3 ?R1_v7)))
>                        (bgp (triple ?R1 t2:p4 ?R1_v10)))
>                      (bgp (triple ?R1 t2:p5 ?R1_v8)))
>                    (bgp
>                      (triple ?R1 t3:p6 ?R1_v1)
>                      (triple ?R1_v1 t2:p5 ?R1_uv2)
>                    ))
>                  (bgp (triple ?R1 t3:p7 ?R1_v2)))
>                (bgp (triple ?R1 t3:p8 ?R1_v5)))
>              (bgp (triple ?R1 t3:p9 ?R1_v4)))
>            (bgp
>              (triple ?R1 t4:p10 ?R1_v3)
>              (triple ?R1_v3 t5:p11 ?R1_uv1)
>            ))
>          (bgp
>            (triple ?R1 t5:p12 t3:Artifact)
>            (triple ?R1 t1:p0 ?R1_resourceContext)
>          )))))
>
> Note the assign ((?R1 <https://host/rm/resources/r1>), which makes the
> query scalable on a large repository
>