You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Paul Tyson <ph...@sbcglobal.net> on 2016/09/14 01:01:03 UTC

sparql algebra differences jena 2.13.0/3.n

I have some queries that worked fine in jena-2.13.0 but not in
jena-3.1.0, using the same data.

For a long time I've been running a couple dozen queries regularly over
a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
to jena-3.1.0, I found that 5 of these queries would not return (ran
forever). qparse revealed that the sparql algebra is quite different in
2.13.0 and 3.1.0 (or apparently any 3.n.n version).

Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
with the algebra given by qparse --explain for 2.13.0 and 3.1.0:

prefix : <http://example.org>
CONSTRUCT {
?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
}
WHERE {
FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
"str3" || regex(?var4,"pat1"))))
?var2 :p1 ?var4 ; :p2 ?var3 .
{{
?var1 :p3 ?var4 .
} UNION {
?var1 :p4 ?var4 .
}}
}
 
Jena-2.13.0 produces algebra:
(prefix ((: <http://example.org>))
  (sequence
    (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
      (sequence
        (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
"pat1")))
          (bgp (triple ?var2 :p1 ?var4)))
        (bgp (triple ?var2 :p2 ?var3))))
    (union
      (bgp (triple ?var1 :p3 ?var4))
      (bgp (triple ?var1 :p4 ?var4)))))
 
Jena-3.1.0 produces algebra:
(prefix ((: <http://example.org>))
  (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
"pat1")))
    (disjunction
      (assign ((?var3 "str1"))
        (sequence
          (bgp
            (triple ?var2 :p1 ?var4)
            (triple ?var2 :p2 "str1")
          )
          (union
            (bgp (triple ?var1 :p3 ?var4))
            (bgp (triple ?var1 :p4 ?var4)))))
      (assign ((?var3 "str2"))
        (sequence
          (bgp
            (triple ?var2 :p1 ?var4)
            (triple ?var2 :p2 "str2")
          )
          (union
            (bgp (triple ?var1 :p3 ?var4))
            (bgp (triple ?var1 :p4 ?var4))))))))
 
Thanks for any insight or assistance into this problem.

Regards,
--Paul


Re: sparql algebra differences jena 2.13.0/3.n

Posted by Andy Seaborne <an...@apache.org>.
Hi Paul,

On 14/09/16 13:15, Paul Tyson wrote:
> On Wed, 2016-09-14 at 10:57 +0100, Andy Seaborne wrote:
>> Hi Paul,
>>
>> It's difficult to tell what's going on from your report. Plain strings
>> are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have
>> related the data for running Jena 3.x.
>
> I admit I have not studied the subtleties around string literals with
> and without datatype tags. None of my data loadfiles have tagged string
> literals, nor do my queries. Are you saying they should?

If there are no ^^xsd:string anywhere whatsoever, I don't think a reload 
is necessary but I can't guarantee it.

There is no point saying ^^xsd:string in RDF 1.1.  "abc" and 
"abc"^^xsd:string are two ways to write exactly the same thing.

>
>>
>> On less data, does either case produce the wrong answers?
>>
>
> I'll produce a smaller dataset to test.
>
>> The regex is not being pushed inwards in the same way which may be an
>> issue - it "all depends" on the data.
>>
>> A smaller query exhibiting a timing difference would be very helpful.
>> Are all parts of the FILTER necessary for the effect?
> Yes, they eliminate spurious matches.

OK - in your data they are needed but are they needed to show the effect?

I've mocked up pushing the regex inwards in 3.x - what I don't know is 
whether this is actually the thing you are seeing, or whether there is 
something else going on which is the dominant effect.

	Andy

>
>>
>> 	Andy
>>
>> Unrelated:
>>
>> {
>> ?var1 :p3 ?var4 .
>> } UNION {
>> ?var1 :p4 ?var4 .
>> }
>>
>> can be written
>>
>> ?var1 (:p3|:p4) ?var4
>>
>>
> Yes, but I generate these queries from RIF source, and UNION is easier
> for the general RIF statement "Or(x,y)". The surface syntax doesn't make
> any difference in the algebra, does it?
>
> Regards,
> --Paul
>
>> On 14/09/16 02:01, Paul Tyson wrote:
>>> I have some queries that worked fine in jena-2.13.0 but not in
>>> jena-3.1.0, using the same data.
>>>
>>> For a long time I've been running a couple dozen queries regularly over
>>> a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
>>> to jena-3.1.0, I found that 5 of these queries would not return (ran
>>> forever). qparse revealed that the sparql algebra is quite different in
>>> 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
>>>
>>> Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
>>> with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
>>>
>>> prefix : <http://example.org>
>>> CONSTRUCT {
>>> ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
>>> }
>>> WHERE {
>>> FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
>>> "str3" || regex(?var4,"pat1"))))
>>> ?var2 :p1 ?var4 ; :p2 ?var3 .
>>> {{
>>> ?var1 :p3 ?var4 .
>>> } UNION {
>>> ?var1 :p4 ?var4 .
>>> }}
>>> }
>>>
>>> Jena-2.13.0 produces algebra:
>>> (prefix ((: <http://example.org>))
>>>   (sequence
>>>     (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
>>>       (sequence
>>>         (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>>> "pat1")))
>>>           (bgp (triple ?var2 :p1 ?var4)))
>>>         (bgp (triple ?var2 :p2 ?var3))))
>>>     (union
>>>       (bgp (triple ?var1 :p3 ?var4))
>>>       (bgp (triple ?var1 :p4 ?var4)))))
>>>
>>> Jena-3.1.0 produces algebra:
>>> (prefix ((: <http://example.org>))
>>>   (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>>> "pat1")))
>>>     (disjunction
>>>       (assign ((?var3 "str1"))
>>>         (sequence
>>>           (bgp
>>>             (triple ?var2 :p1 ?var4)
>>>             (triple ?var2 :p2 "str1")
>>>           )
>>>           (union
>>>             (bgp (triple ?var1 :p3 ?var4))
>>>             (bgp (triple ?var1 :p4 ?var4)))))
>>>       (assign ((?var3 "str2"))
>>>         (sequence
>>>           (bgp
>>>             (triple ?var2 :p1 ?var4)
>>>             (triple ?var2 :p2 "str2")
>>>           )
>>>           (union
>>>             (bgp (triple ?var1 :p3 ?var4))
>>>             (bgp (triple ?var1 :p4 ?var4))))))))
>>>
>>> Thanks for any insight or assistance into this problem.
>>>
>>> Regards,
>>> --Paul
>>>
>
>

Re: sparql algebra differences jena 2.13.0/3.n

Posted by Andy Seaborne <an...@apache.org>.
In Jena 2.x, RDF 1.0 "?var = string" wasn't an optimization because it 
is not the same as writing it into the graph pattern. In RDF 1.1, it is 
so the optimization became more practical.  At the same time, the whole 
of filter placement has been written to make it more comprehensive.

The disjunction used by string equality was not handled for later placement:

https://issues.apache.org/jira/browse/JENA-1235

> I have noticed other cases where order of triples and bgps makes
> quite a difference in execution time, but I can't figure out any
> science to it. Are there any guidelines for ordering the components
> of a complex query (including UNION and OPTIONAL clauses) to optimize
> performance? Can you tell anything by a static analysis of the sparql
> algebra?

BGP optimization and UNION/OPTIONAL are different optimization steps. 
For TDB, BGP uses the stats file if present, else the fixed optimization 
style of most grounded triple pattern first. If there are equal 
re-orderings, the original order is retained.

For UNION/OPTIONAL, there quite a number of optimizations, equality and 
filter placement being two of the major ones.

There is a tension between these two high level optimizations and BGP 
reordering (which is done later). They both may be beneficial yet 
telling which is better is quite hard and very data dependent.

	Andy


On 18/09/16 22:07, Paul Tyson wrote:
> I looked at some more queries that worked in jena 2.x but seem to hang
> in 3.x. They all follow the same pattern of a complex FILTER query on
> string values. Rewriting the filter conditions into subgraph patterns
> solved the problem.
>
> Is this a defect induced by algebra optimizations in 3.x? Or, is it more
> proper to apply string filters in the manner you suggested, by enclosing
> them in subgraph patterns close to the triples they filter?
>
> There was one case that was a little more complex. The original query
> was like:
>
> CONSTRUCT {
> ?var1 :p1 false .
> }
> WHERE {
> FILTER ((?var2 != "str1" && !strstarts(?var3,"str2")))
> ?var1 :p2 ?var3 ;
>  :p3 ?var2 ;
>  :p4 "str3" ;
>  :p5 "str4" ;
>  :p6 "str5" .
> FILTER NOT EXISTS {
> FILTER (((?var4 = "str6" || ?var4 = "str7" || ?var4 = "str8" || ?var4 =
> "str9" || ?var4 = "str10" || ?var4 = "str11" || ?var4 = "str12" || ?var4
> = "str13")))
> ?var5 :p7 ?var4 ;
>  :p8 ?var3 .
> }
> }
>
> I initially rewrote the FILTER NOT EXISTS clause to read:
>
> FILTER NOT EXISTS {
> {FILTER (((?var4 = "str6" || ?var4 = "str7" || ?var4 = "str8" || ?var4 =
> "str9" || ?var4 = "str10" || ?var4 = "str11" || ?var4 = "str12" || ?var4
> = "str13")))
> ?var5 :p7 ?var4 .}
> ?var5 :p8 ?var3 .
> }
>
> which still seemed to hang. Reordering the FILTER NOT EXISTS bgp to the
> following solved the problem.
>
> FILTER NOT EXISTS {
> ?var5 :p8 ?var3 .
> {FILTER (((?var4 = "str6" || ?var4 = "str7" || ?var4 = "str8" || ?var4 =
> "str9" || ?var4 = "str10" || ?var4 = "str11" || ?var4 = "str12" || ?var4
> = "str13")))
> ?var5 :p7 ?var4 .}
> }
>
> I have noticed other cases where order of triples and bgps makes quite a
> difference in execution time, but I can't figure out any science to it.
> Are there any guidelines for ordering the components of a complex query
> (including UNION and OPTIONAL clauses) to optimize performance? Can you
> tell anything by a static analysis of the sparql algebra?
>
> Regards,
> --Paul
>
>
>
> On Fri, 2016-09-16 at 08:37 -0500, Paul Tyson wrote:
>> Andy,
>>
>> With that rewrite, the 3.x tdbquery works as expected.
>>
>> I will investigate further this weekend and send other queries that don't work in 3.x.
>>
>> Regards,
>> --Paul
>>
>>> On Sep 16, 2016, at 04:26, Andy Seaborne <an...@apache.org> wrote:
>>>
>>> Paul,  If you could try the query below which mimics the effect of placing the ?var4 filter part, it will help determine if this is a filter placement issue or not.
>>>
>>> The difference is that first basic graph pattern is inside a {} with the relevant part of the filter expression.
>>>
>>>    Andy
>>>
>>>
>>> PREFIX  :     <http://example/>
>>>
>>> SELECT  *
>>> WHERE
>>>  { FILTER ( ( ?var3 = "str1" ) || ( ?var3 = "str2" ) )
>>>    { ?var2  :p1  ?var4 ;
>>>             :p2  ?var3
>>>      FILTER ( ! ( ( ( ?var4 = "" ) ||
>>>               ( ?var4 = "str3" ) ) ||
>>>               regex(?var4, "pat1") ) )
>>>    }
>>>    {   { ?var1  :p3  ?var4 }
>>>      UNION
>>>        { ?var1  :p4  ?var4 }
>>>    }
>>>  }
>>>
>>>
>>>    Andy
>>>
>>>
>>>> On 14/09/16 13:15, Paul Tyson wrote:
>>>>> On Wed, 2016-09-14 at 10:57 +0100, Andy Seaborne wrote:
>>>>> Hi Paul,
>>>>>
>>>>> It's difficult to tell what's going on from your report. Plain strings
>>>>> are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have
>>>>> related the data for running Jena 3.x.
>>>>
>>>> I admit I have not studied the subtleties around string literals with
>>>> and without datatype tags. None of my data loadfiles have tagged string
>>>> literals, nor do my queries. Are you saying they should?
>>>>
>>>>>
>>>>> On less data, does either case produce the wrong answers?
>>>>
>>>> I'll produce a smaller dataset to test.
>>>>
>>>>> The regex is not being pushed inwards in the same way which may be an
>>>>> issue - it "all depends" on the data.
>>>>>
>>>>> A smaller query exhibiting a timing difference would be very helpful.
>>>>> Are all parts of the FILTER necessary for the effect?
>>>> Yes, they eliminate spurious matches.
>>>>
>>>>>
>>>>>    Andy
>>>>>
>>>>> Unrelated:
>>>>>
>>>>> {
>>>>> ?var1 :p3 ?var4 .
>>>>> } UNION {
>>>>> ?var1 :p4 ?var4 .
>>>>> }
>>>>>
>>>>> can be written
>>>>>
>>>>> ?var1 (:p3|:p4) ?var4
>>>> Yes, but I generate these queries from RIF source, and UNION is easier
>>>> for the general RIF statement "Or(x,y)". The surface syntax doesn't make
>>>> any difference in the algebra, does it?
>>>>
>>>> Regards,
>>>> --Paul
>>>>
>>>>>> On 14/09/16 02:01, Paul Tyson wrote:
>>>>>> I have some queries that worked fine in jena-2.13.0 but not in
>>>>>> jena-3.1.0, using the same data.
>>>>>>
>>>>>> For a long time I've been running a couple dozen queries regularly over
>>>>>> a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
>>>>>> to jena-3.1.0, I found that 5 of these queries would not return (ran
>>>>>> forever). qparse revealed that the sparql algebra is quite different in
>>>>>> 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
>>>>>>
>>>>>> Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
>>>>>> with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
>>>>>>
>>>>>> prefix : <http://example.org>
>>>>>> CONSTRUCT {
>>>>>> ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
>>>>>> }
>>>>>> WHERE {
>>>>>> FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
>>>>>> "str3" || regex(?var4,"pat1"))))
>>>>>> ?var2 :p1 ?var4 ; :p2 ?var3 .
>>>>>> {{
>>>>>> ?var1 :p3 ?var4 .
>>>>>> } UNION {
>>>>>> ?var1 :p4 ?var4 .
>>>>>> }}
>>>>>> }
>>>>>>
>>>>>> Jena-2.13.0 produces algebra:
>>>>>> (prefix ((: <http://example.org>))
>>>>>>  (sequence
>>>>>>    (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
>>>>>>      (sequence
>>>>>>        (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>>>>>> "pat1")))
>>>>>>          (bgp (triple ?var2 :p1 ?var4)))
>>>>>>        (bgp (triple ?var2 :p2 ?var3))))
>>>>>>    (union
>>>>>>      (bgp (triple ?var1 :p3 ?var4))
>>>>>>      (bgp (triple ?var1 :p4 ?var4)))))
>>>>>>
>>>>>> Jena-3.1.0 produces algebra:
>>>>>> (prefix ((: <http://example.org>))
>>>>>>  (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>>>>>> "pat1")))
>>>>>>    (disjunction
>>>>>>      (assign ((?var3 "str1"))
>>>>>>        (sequence
>>>>>>          (bgp
>>>>>>            (triple ?var2 :p1 ?var4)
>>>>>>            (triple ?var2 :p2 "str1")
>>>>>>          )
>>>>>>          (union
>>>>>>            (bgp (triple ?var1 :p3 ?var4))
>>>>>>            (bgp (triple ?var1 :p4 ?var4)))))
>>>>>>      (assign ((?var3 "str2"))
>>>>>>        (sequence
>>>>>>          (bgp
>>>>>>            (triple ?var2 :p1 ?var4)
>>>>>>            (triple ?var2 :p2 "str2")
>>>>>>          )
>>>>>>          (union
>>>>>>            (bgp (triple ?var1 :p3 ?var4))
>>>>>>            (bgp (triple ?var1 :p4 ?var4))))))))
>>>>>>
>>>>>> Thanks for any insight or assistance into this problem.
>>>>>>
>>>>>> Regards,
>>>>>> --Paul
>>>>
>>>>
>>
>
>

Re: sparql algebra differences jena 2.13.0/3.n

Posted by Paul Tyson <ph...@sbcglobal.net>.
I looked at some more queries that worked in jena 2.x but seem to hang
in 3.x. They all follow the same pattern of a complex FILTER query on
string values. Rewriting the filter conditions into subgraph patterns
solved the problem.

Is this a defect induced by algebra optimizations in 3.x? Or, is it more
proper to apply string filters in the manner you suggested, by enclosing
them in subgraph patterns close to the triples they filter?

There was one case that was a little more complex. The original query
was like:

CONSTRUCT {
?var1 :p1 false .
}
WHERE {
FILTER ((?var2 != "str1" && !strstarts(?var3,"str2")))
?var1 :p2 ?var3 ;
 :p3 ?var2 ;
 :p4 "str3" ;
 :p5 "str4" ;
 :p6 "str5" .
FILTER NOT EXISTS {
FILTER (((?var4 = "str6" || ?var4 = "str7" || ?var4 = "str8" || ?var4 =
"str9" || ?var4 = "str10" || ?var4 = "str11" || ?var4 = "str12" || ?var4
= "str13")))
?var5 :p7 ?var4 ;
 :p8 ?var3 .
}
}

I initially rewrote the FILTER NOT EXISTS clause to read:

FILTER NOT EXISTS {
{FILTER (((?var4 = "str6" || ?var4 = "str7" || ?var4 = "str8" || ?var4 =
"str9" || ?var4 = "str10" || ?var4 = "str11" || ?var4 = "str12" || ?var4
= "str13")))
?var5 :p7 ?var4 .}
?var5 :p8 ?var3 .
}

which still seemed to hang. Reordering the FILTER NOT EXISTS bgp to the
following solved the problem.

FILTER NOT EXISTS {
?var5 :p8 ?var3 .
{FILTER (((?var4 = "str6" || ?var4 = "str7" || ?var4 = "str8" || ?var4 =
"str9" || ?var4 = "str10" || ?var4 = "str11" || ?var4 = "str12" || ?var4
= "str13"))) 
?var5 :p7 ?var4 .}
}

I have noticed other cases where order of triples and bgps makes quite a
difference in execution time, but I can't figure out any science to it.
Are there any guidelines for ordering the components of a complex query
(including UNION and OPTIONAL clauses) to optimize performance? Can you
tell anything by a static analysis of the sparql algebra?

Regards,
--Paul 



On Fri, 2016-09-16 at 08:37 -0500, Paul Tyson wrote:
> Andy,
> 
> With that rewrite, the 3.x tdbquery works as expected.
> 
> I will investigate further this weekend and send other queries that don't work in 3.x.
> 
> Regards,
> --Paul
> 
> > On Sep 16, 2016, at 04:26, Andy Seaborne <an...@apache.org> wrote:
> > 
> > Paul,  If you could try the query below which mimics the effect of placing the ?var4 filter part, it will help determine if this is a filter placement issue or not.
> > 
> > The difference is that first basic graph pattern is inside a {} with the relevant part of the filter expression.
> > 
> >    Andy
> > 
> > 
> > PREFIX  :     <http://example/>
> > 
> > SELECT  *
> > WHERE
> >  { FILTER ( ( ?var3 = "str1" ) || ( ?var3 = "str2" ) )
> >    { ?var2  :p1  ?var4 ;
> >             :p2  ?var3
> >      FILTER ( ! ( ( ( ?var4 = "" ) ||
> >               ( ?var4 = "str3" ) ) ||
> >               regex(?var4, "pat1") ) )
> >    }
> >    {   { ?var1  :p3  ?var4 }
> >      UNION
> >        { ?var1  :p4  ?var4 }
> >    }
> >  }
> > 
> > 
> >    Andy
> > 
> > 
> >> On 14/09/16 13:15, Paul Tyson wrote:
> >>> On Wed, 2016-09-14 at 10:57 +0100, Andy Seaborne wrote:
> >>> Hi Paul,
> >>> 
> >>> It's difficult to tell what's going on from your report. Plain strings
> >>> are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have
> >>> related the data for running Jena 3.x.
> >> 
> >> I admit I have not studied the subtleties around string literals with
> >> and without datatype tags. None of my data loadfiles have tagged string
> >> literals, nor do my queries. Are you saying they should?
> >> 
> >>> 
> >>> On less data, does either case produce the wrong answers?
> >> 
> >> I'll produce a smaller dataset to test.
> >> 
> >>> The regex is not being pushed inwards in the same way which may be an
> >>> issue - it "all depends" on the data.
> >>> 
> >>> A smaller query exhibiting a timing difference would be very helpful.
> >>> Are all parts of the FILTER necessary for the effect?
> >> Yes, they eliminate spurious matches.
> >> 
> >>> 
> >>>    Andy
> >>> 
> >>> Unrelated:
> >>> 
> >>> {
> >>> ?var1 :p3 ?var4 .
> >>> } UNION {
> >>> ?var1 :p4 ?var4 .
> >>> }
> >>> 
> >>> can be written
> >>> 
> >>> ?var1 (:p3|:p4) ?var4
> >> Yes, but I generate these queries from RIF source, and UNION is easier
> >> for the general RIF statement "Or(x,y)". The surface syntax doesn't make
> >> any difference in the algebra, does it?
> >> 
> >> Regards,
> >> --Paul
> >> 
> >>>> On 14/09/16 02:01, Paul Tyson wrote:
> >>>> I have some queries that worked fine in jena-2.13.0 but not in
> >>>> jena-3.1.0, using the same data.
> >>>> 
> >>>> For a long time I've been running a couple dozen queries regularly over
> >>>> a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
> >>>> to jena-3.1.0, I found that 5 of these queries would not return (ran
> >>>> forever). qparse revealed that the sparql algebra is quite different in
> >>>> 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
> >>>> 
> >>>> Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
> >>>> with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
> >>>> 
> >>>> prefix : <http://example.org>
> >>>> CONSTRUCT {
> >>>> ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
> >>>> }
> >>>> WHERE {
> >>>> FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
> >>>> "str3" || regex(?var4,"pat1"))))
> >>>> ?var2 :p1 ?var4 ; :p2 ?var3 .
> >>>> {{
> >>>> ?var1 :p3 ?var4 .
> >>>> } UNION {
> >>>> ?var1 :p4 ?var4 .
> >>>> }}
> >>>> }
> >>>> 
> >>>> Jena-2.13.0 produces algebra:
> >>>> (prefix ((: <http://example.org>))
> >>>>  (sequence
> >>>>    (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
> >>>>      (sequence
> >>>>        (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
> >>>> "pat1")))
> >>>>          (bgp (triple ?var2 :p1 ?var4)))
> >>>>        (bgp (triple ?var2 :p2 ?var3))))
> >>>>    (union
> >>>>      (bgp (triple ?var1 :p3 ?var4))
> >>>>      (bgp (triple ?var1 :p4 ?var4)))))
> >>>> 
> >>>> Jena-3.1.0 produces algebra:
> >>>> (prefix ((: <http://example.org>))
> >>>>  (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
> >>>> "pat1")))
> >>>>    (disjunction
> >>>>      (assign ((?var3 "str1"))
> >>>>        (sequence
> >>>>          (bgp
> >>>>            (triple ?var2 :p1 ?var4)
> >>>>            (triple ?var2 :p2 "str1")
> >>>>          )
> >>>>          (union
> >>>>            (bgp (triple ?var1 :p3 ?var4))
> >>>>            (bgp (triple ?var1 :p4 ?var4)))))
> >>>>      (assign ((?var3 "str2"))
> >>>>        (sequence
> >>>>          (bgp
> >>>>            (triple ?var2 :p1 ?var4)
> >>>>            (triple ?var2 :p2 "str2")
> >>>>          )
> >>>>          (union
> >>>>            (bgp (triple ?var1 :p3 ?var4))
> >>>>            (bgp (triple ?var1 :p4 ?var4))))))))
> >>>> 
> >>>> Thanks for any insight or assistance into this problem.
> >>>> 
> >>>> Regards,
> >>>> --Paul
> >> 
> >> 
> 



Re: sparql algebra differences jena 2.13.0/3.n

Posted by Paul Tyson <ph...@sbcglobal.net>.
Andy,

With that rewrite, the 3.x tdbquery works as expected.

I will investigate further this weekend and send other queries that don't work in 3.x.

Regards,
--Paul

> On Sep 16, 2016, at 04:26, Andy Seaborne <an...@apache.org> wrote:
> 
> Paul,  If you could try the query below which mimics the effect of placing the ?var4 filter part, it will help determine if this is a filter placement issue or not.
> 
> The difference is that first basic graph pattern is inside a {} with the relevant part of the filter expression.
> 
>    Andy
> 
> 
> PREFIX  :     <http://example/>
> 
> SELECT  *
> WHERE
>  { FILTER ( ( ?var3 = "str1" ) || ( ?var3 = "str2" ) )
>    { ?var2  :p1  ?var4 ;
>             :p2  ?var3
>      FILTER ( ! ( ( ( ?var4 = "" ) ||
>               ( ?var4 = "str3" ) ) ||
>               regex(?var4, "pat1") ) )
>    }
>    {   { ?var1  :p3  ?var4 }
>      UNION
>        { ?var1  :p4  ?var4 }
>    }
>  }
> 
> 
>    Andy
> 
> 
>> On 14/09/16 13:15, Paul Tyson wrote:
>>> On Wed, 2016-09-14 at 10:57 +0100, Andy Seaborne wrote:
>>> Hi Paul,
>>> 
>>> It's difficult to tell what's going on from your report. Plain strings
>>> are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have
>>> related the data for running Jena 3.x.
>> 
>> I admit I have not studied the subtleties around string literals with
>> and without datatype tags. None of my data loadfiles have tagged string
>> literals, nor do my queries. Are you saying they should?
>> 
>>> 
>>> On less data, does either case produce the wrong answers?
>> 
>> I'll produce a smaller dataset to test.
>> 
>>> The regex is not being pushed inwards in the same way which may be an
>>> issue - it "all depends" on the data.
>>> 
>>> A smaller query exhibiting a timing difference would be very helpful.
>>> Are all parts of the FILTER necessary for the effect?
>> Yes, they eliminate spurious matches.
>> 
>>> 
>>>    Andy
>>> 
>>> Unrelated:
>>> 
>>> {
>>> ?var1 :p3 ?var4 .
>>> } UNION {
>>> ?var1 :p4 ?var4 .
>>> }
>>> 
>>> can be written
>>> 
>>> ?var1 (:p3|:p4) ?var4
>> Yes, but I generate these queries from RIF source, and UNION is easier
>> for the general RIF statement "Or(x,y)". The surface syntax doesn't make
>> any difference in the algebra, does it?
>> 
>> Regards,
>> --Paul
>> 
>>>> On 14/09/16 02:01, Paul Tyson wrote:
>>>> I have some queries that worked fine in jena-2.13.0 but not in
>>>> jena-3.1.0, using the same data.
>>>> 
>>>> For a long time I've been running a couple dozen queries regularly over
>>>> a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
>>>> to jena-3.1.0, I found that 5 of these queries would not return (ran
>>>> forever). qparse revealed that the sparql algebra is quite different in
>>>> 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
>>>> 
>>>> Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
>>>> with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
>>>> 
>>>> prefix : <http://example.org>
>>>> CONSTRUCT {
>>>> ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
>>>> }
>>>> WHERE {
>>>> FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
>>>> "str3" || regex(?var4,"pat1"))))
>>>> ?var2 :p1 ?var4 ; :p2 ?var3 .
>>>> {{
>>>> ?var1 :p3 ?var4 .
>>>> } UNION {
>>>> ?var1 :p4 ?var4 .
>>>> }}
>>>> }
>>>> 
>>>> Jena-2.13.0 produces algebra:
>>>> (prefix ((: <http://example.org>))
>>>>  (sequence
>>>>    (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
>>>>      (sequence
>>>>        (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>>>> "pat1")))
>>>>          (bgp (triple ?var2 :p1 ?var4)))
>>>>        (bgp (triple ?var2 :p2 ?var3))))
>>>>    (union
>>>>      (bgp (triple ?var1 :p3 ?var4))
>>>>      (bgp (triple ?var1 :p4 ?var4)))))
>>>> 
>>>> Jena-3.1.0 produces algebra:
>>>> (prefix ((: <http://example.org>))
>>>>  (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>>>> "pat1")))
>>>>    (disjunction
>>>>      (assign ((?var3 "str1"))
>>>>        (sequence
>>>>          (bgp
>>>>            (triple ?var2 :p1 ?var4)
>>>>            (triple ?var2 :p2 "str1")
>>>>          )
>>>>          (union
>>>>            (bgp (triple ?var1 :p3 ?var4))
>>>>            (bgp (triple ?var1 :p4 ?var4)))))
>>>>      (assign ((?var3 "str2"))
>>>>        (sequence
>>>>          (bgp
>>>>            (triple ?var2 :p1 ?var4)
>>>>            (triple ?var2 :p2 "str2")
>>>>          )
>>>>          (union
>>>>            (bgp (triple ?var1 :p3 ?var4))
>>>>            (bgp (triple ?var1 :p4 ?var4))))))))
>>>> 
>>>> Thanks for any insight or assistance into this problem.
>>>> 
>>>> Regards,
>>>> --Paul
>> 
>> 


Re: sparql algebra differences jena 2.13.0/3.n

Posted by Paulo Picota <pp...@gmail.com>.
hello I will like to be remove from this Jena listing =)

thanks for everything.

PP

2016-09-16 4:26 GMT-05:00 Andy Seaborne <an...@apache.org>:

> Paul,  If you could try the query below which mimics the effect of placing
> the ?var4 filter part, it will help determine if this is a filter placement
> issue or not.
>
> The difference is that first basic graph pattern is inside a {} with the
> relevant part of the filter expression.
>
>         Andy
>
>
> PREFIX  :     <http://example/>
>
> SELECT  *
> WHERE
>   { FILTER ( ( ?var3 = "str1" ) || ( ?var3 = "str2" ) )
>     { ?var2  :p1  ?var4 ;
>              :p2  ?var3
>       FILTER ( ! ( ( ( ?var4 = "" ) ||
>                ( ?var4 = "str3" ) ) ||
>                regex(?var4, "pat1") ) )
>     }
>     {   { ?var1  :p3  ?var4 }
>       UNION
>         { ?var1  :p4  ?var4 }
>     }
>   }
>
>
>         Andy
>
>
> On 14/09/16 13:15, Paul Tyson wrote:
>
>> On Wed, 2016-09-14 at 10:57 +0100, Andy Seaborne wrote:
>>
>>> Hi Paul,
>>>
>>> It's difficult to tell what's going on from your report. Plain strings
>>> are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have
>>> related the data for running Jena 3.x.
>>>
>>
>> I admit I have not studied the subtleties around string literals with
>> and without datatype tags. None of my data loadfiles have tagged string
>> literals, nor do my queries. Are you saying they should?
>>
>>
>>> On less data, does either case produce the wrong answers?
>>>
>>>
>> I'll produce a smaller dataset to test.
>>
>> The regex is not being pushed inwards in the same way which may be an
>>> issue - it "all depends" on the data.
>>>
>>> A smaller query exhibiting a timing difference would be very helpful.
>>> Are all parts of the FILTER necessary for the effect?
>>>
>> Yes, they eliminate spurious matches.
>>
>>
>>>         Andy
>>>
>>> Unrelated:
>>>
>>> {
>>> ?var1 :p3 ?var4 .
>>> } UNION {
>>> ?var1 :p4 ?var4 .
>>> }
>>>
>>> can be written
>>>
>>> ?var1 (:p3|:p4) ?var4
>>>
>>>
>>> Yes, but I generate these queries from RIF source, and UNION is easier
>> for the general RIF statement "Or(x,y)". The surface syntax doesn't make
>> any difference in the algebra, does it?
>>
>> Regards,
>> --Paul
>>
>> On 14/09/16 02:01, Paul Tyson wrote:
>>>
>>>> I have some queries that worked fine in jena-2.13.0 but not in
>>>> jena-3.1.0, using the same data.
>>>>
>>>> For a long time I've been running a couple dozen queries regularly over
>>>> a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
>>>> to jena-3.1.0, I found that 5 of these queries would not return (ran
>>>> forever). qparse revealed that the sparql algebra is quite different in
>>>> 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
>>>>
>>>> Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
>>>> with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
>>>>
>>>> prefix : <http://example.org>
>>>> CONSTRUCT {
>>>> ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
>>>> }
>>>> WHERE {
>>>> FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
>>>> "str3" || regex(?var4,"pat1"))))
>>>> ?var2 :p1 ?var4 ; :p2 ?var3 .
>>>> {{
>>>> ?var1 :p3 ?var4 .
>>>> } UNION {
>>>> ?var1 :p4 ?var4 .
>>>> }}
>>>> }
>>>>
>>>> Jena-2.13.0 produces algebra:
>>>> (prefix ((: <http://example.org>))
>>>>   (sequence
>>>>     (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
>>>>       (sequence
>>>>         (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>>>> "pat1")))
>>>>           (bgp (triple ?var2 :p1 ?var4)))
>>>>         (bgp (triple ?var2 :p2 ?var3))))
>>>>     (union
>>>>       (bgp (triple ?var1 :p3 ?var4))
>>>>       (bgp (triple ?var1 :p4 ?var4)))))
>>>>
>>>> Jena-3.1.0 produces algebra:
>>>> (prefix ((: <http://example.org>))
>>>>   (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>>>> "pat1")))
>>>>     (disjunction
>>>>       (assign ((?var3 "str1"))
>>>>         (sequence
>>>>           (bgp
>>>>             (triple ?var2 :p1 ?var4)
>>>>             (triple ?var2 :p2 "str1")
>>>>           )
>>>>           (union
>>>>             (bgp (triple ?var1 :p3 ?var4))
>>>>             (bgp (triple ?var1 :p4 ?var4)))))
>>>>       (assign ((?var3 "str2"))
>>>>         (sequence
>>>>           (bgp
>>>>             (triple ?var2 :p1 ?var4)
>>>>             (triple ?var2 :p2 "str2")
>>>>           )
>>>>           (union
>>>>             (bgp (triple ?var1 :p3 ?var4))
>>>>             (bgp (triple ?var1 :p4 ?var4))))))))
>>>>
>>>> Thanks for any insight or assistance into this problem.
>>>>
>>>> Regards,
>>>> --Paul
>>>>
>>>>
>>
>>

Re: sparql algebra differences jena 2.13.0/3.n

Posted by Andy Seaborne <an...@apache.org>.
Paul,  If you could try the query below which mimics the effect of 
placing the ?var4 filter part, it will help determine if this is a 
filter placement issue or not.

The difference is that first basic graph pattern is inside a {} with the 
relevant part of the filter expression.

	Andy


PREFIX  :     <http://example/>

SELECT  *
WHERE
   { FILTER ( ( ?var3 = "str1" ) || ( ?var3 = "str2" ) )
     { ?var2  :p1  ?var4 ;
              :p2  ?var3
       FILTER ( ! ( ( ( ?var4 = "" ) ||
                ( ?var4 = "str3" ) ) ||
                regex(?var4, "pat1") ) )
     }
     {   { ?var1  :p3  ?var4 }
       UNION
         { ?var1  :p4  ?var4 }
     }
   }


	Andy


On 14/09/16 13:15, Paul Tyson wrote:
> On Wed, 2016-09-14 at 10:57 +0100, Andy Seaborne wrote:
>> Hi Paul,
>>
>> It's difficult to tell what's going on from your report. Plain strings
>> are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have
>> related the data for running Jena 3.x.
>
> I admit I have not studied the subtleties around string literals with
> and without datatype tags. None of my data loadfiles have tagged string
> literals, nor do my queries. Are you saying they should?
>
>>
>> On less data, does either case produce the wrong answers?
>>
>
> I'll produce a smaller dataset to test.
>
>> The regex is not being pushed inwards in the same way which may be an
>> issue - it "all depends" on the data.
>>
>> A smaller query exhibiting a timing difference would be very helpful.
>> Are all parts of the FILTER necessary for the effect?
> Yes, they eliminate spurious matches.
>
>>
>> 	Andy
>>
>> Unrelated:
>>
>> {
>> ?var1 :p3 ?var4 .
>> } UNION {
>> ?var1 :p4 ?var4 .
>> }
>>
>> can be written
>>
>> ?var1 (:p3|:p4) ?var4
>>
>>
> Yes, but I generate these queries from RIF source, and UNION is easier
> for the general RIF statement "Or(x,y)". The surface syntax doesn't make
> any difference in the algebra, does it?
>
> Regards,
> --Paul
>
>> On 14/09/16 02:01, Paul Tyson wrote:
>>> I have some queries that worked fine in jena-2.13.0 but not in
>>> jena-3.1.0, using the same data.
>>>
>>> For a long time I've been running a couple dozen queries regularly over
>>> a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
>>> to jena-3.1.0, I found that 5 of these queries would not return (ran
>>> forever). qparse revealed that the sparql algebra is quite different in
>>> 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
>>>
>>> Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
>>> with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
>>>
>>> prefix : <http://example.org>
>>> CONSTRUCT {
>>> ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
>>> }
>>> WHERE {
>>> FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
>>> "str3" || regex(?var4,"pat1"))))
>>> ?var2 :p1 ?var4 ; :p2 ?var3 .
>>> {{
>>> ?var1 :p3 ?var4 .
>>> } UNION {
>>> ?var1 :p4 ?var4 .
>>> }}
>>> }
>>>
>>> Jena-2.13.0 produces algebra:
>>> (prefix ((: <http://example.org>))
>>>   (sequence
>>>     (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
>>>       (sequence
>>>         (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>>> "pat1")))
>>>           (bgp (triple ?var2 :p1 ?var4)))
>>>         (bgp (triple ?var2 :p2 ?var3))))
>>>     (union
>>>       (bgp (triple ?var1 :p3 ?var4))
>>>       (bgp (triple ?var1 :p4 ?var4)))))
>>>
>>> Jena-3.1.0 produces algebra:
>>> (prefix ((: <http://example.org>))
>>>   (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>>> "pat1")))
>>>     (disjunction
>>>       (assign ((?var3 "str1"))
>>>         (sequence
>>>           (bgp
>>>             (triple ?var2 :p1 ?var4)
>>>             (triple ?var2 :p2 "str1")
>>>           )
>>>           (union
>>>             (bgp (triple ?var1 :p3 ?var4))
>>>             (bgp (triple ?var1 :p4 ?var4)))))
>>>       (assign ((?var3 "str2"))
>>>         (sequence
>>>           (bgp
>>>             (triple ?var2 :p1 ?var4)
>>>             (triple ?var2 :p2 "str2")
>>>           )
>>>           (union
>>>             (bgp (triple ?var1 :p3 ?var4))
>>>             (bgp (triple ?var1 :p4 ?var4))))))))
>>>
>>> Thanks for any insight or assistance into this problem.
>>>
>>> Regards,
>>> --Paul
>>>
>
>

Re: sparql algebra differences jena 2.13.0/3.n

Posted by Paul Tyson <ph...@sbcglobal.net>.
On Wed, 2016-09-14 at 10:57 +0100, Andy Seaborne wrote:
> Hi Paul,
> 
> It's difficult to tell what's going on from your report. Plain strings 
> are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have 
> related the data for running Jena 3.x.

I admit I have not studied the subtleties around string literals with
and without datatype tags. None of my data loadfiles have tagged string
literals, nor do my queries. Are you saying they should?

> 
> On less data, does either case produce the wrong answers?
> 

I'll produce a smaller dataset to test.

> The regex is not being pushed inwards in the same way which may be an 
> issue - it "all depends" on the data.
> 
> A smaller query exhibiting a timing difference would be very helpful. 
> Are all parts of the FILTER necessary for the effect?
Yes, they eliminate spurious matches.

> 
> 	Andy
> 
> Unrelated:
> 
> {
> ?var1 :p3 ?var4 .
> } UNION {
> ?var1 :p4 ?var4 .
> }
> 
> can be written
> 
> ?var1 (:p3|:p4) ?var4
> 
> 
Yes, but I generate these queries from RIF source, and UNION is easier
for the general RIF statement "Or(x,y)". The surface syntax doesn't make
any difference in the algebra, does it?

Regards,
--Paul

> On 14/09/16 02:01, Paul Tyson wrote:
> > I have some queries that worked fine in jena-2.13.0 but not in
> > jena-3.1.0, using the same data.
> >
> > For a long time I've been running a couple dozen queries regularly over
> > a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
> > to jena-3.1.0, I found that 5 of these queries would not return (ran
> > forever). qparse revealed that the sparql algebra is quite different in
> > 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
> >
> > Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
> > with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
> >
> > prefix : <http://example.org>
> > CONSTRUCT {
> > ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
> > }
> > WHERE {
> > FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
> > "str3" || regex(?var4,"pat1"))))
> > ?var2 :p1 ?var4 ; :p2 ?var3 .
> > {{
> > ?var1 :p3 ?var4 .
> > } UNION {
> > ?var1 :p4 ?var4 .
> > }}
> > }
> >
> > Jena-2.13.0 produces algebra:
> > (prefix ((: <http://example.org>))
> >   (sequence
> >     (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
> >       (sequence
> >         (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
> > "pat1")))
> >           (bgp (triple ?var2 :p1 ?var4)))
> >         (bgp (triple ?var2 :p2 ?var3))))
> >     (union
> >       (bgp (triple ?var1 :p3 ?var4))
> >       (bgp (triple ?var1 :p4 ?var4)))))
> >
> > Jena-3.1.0 produces algebra:
> > (prefix ((: <http://example.org>))
> >   (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
> > "pat1")))
> >     (disjunction
> >       (assign ((?var3 "str1"))
> >         (sequence
> >           (bgp
> >             (triple ?var2 :p1 ?var4)
> >             (triple ?var2 :p2 "str1")
> >           )
> >           (union
> >             (bgp (triple ?var1 :p3 ?var4))
> >             (bgp (triple ?var1 :p4 ?var4)))))
> >       (assign ((?var3 "str2"))
> >         (sequence
> >           (bgp
> >             (triple ?var2 :p1 ?var4)
> >             (triple ?var2 :p2 "str2")
> >           )
> >           (union
> >             (bgp (triple ?var1 :p3 ?var4))
> >             (bgp (triple ?var1 :p4 ?var4))))))))
> >
> > Thanks for any insight or assistance into this problem.
> >
> > Regards,
> > --Paul
> >



Re: sparql algebra differences jena 2.13.0/3.n

Posted by Andy Seaborne <an...@apache.org>.

On 14/09/16 11:32, Rob Vesse wrote:
> Andy
>
> I am a little surprised at the filter equality transformation being
> applied to this query. Variables in question are only used in the
> object position so shouldn\u2019t be safe to assume that value equality is
> equivalent to term equality here. Or is this a special case because
> of the change in handling of plain literals in RDF 1.1?

Yes, this is a case that is enabled by RDF 1.1,

FILTER (?object = "abc") can used to be a pattern match and term and 
value equality are equivalent.

sameTerm("abc", "abc"^^xsd:string) was false (RDF 1.0), and is now true 
(RDF 1.1).

     Andy

>
> Rob
>
>  14/09/2016 10:57, "Andy Seaborne" <an...@apache.org> wrote:
>
>     Hi Paul,
>
>     It's difficult to tell what's going on from your report. Plain strings
>     are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have
>     related the data for running Jena 3.x.
>
>     On less data, does either case produce the wrong answers?
>
>     The regex is not being pushed inwards in the same way which may be an
>     issue - it "all depends" on the data.
>
>     A smaller query exhibiting a timing difference would be very helpful.
>     Are all parts of the FILTER necessary for the effect?
>
>     	Andy
>
>     Unrelated:
>
>     {
>     ?var1 :p3 ?var4 .
>     } UNION {
>     ?var1 :p4 ?var4 .
>     }
>
>     can be written
>
>     ?var1 (:p3|:p4) ?var4
>
>
>     On 14/09/16 02:01, Paul Tyson wrote:
>     > I have some queries that worked fine in jena-2.13.0 but not in
>     > jena-3.1.0, using the same data.
>     >
>     > For a long time I've been running a couple dozen queries regularly over
>     > a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
>     > to jena-3.1.0, I found that 5 of these queries would not return (ran
>     > forever). qparse revealed that the sparql algebra is quite different in
>     > 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
>     >
>     > Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
>     > with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
>     >
>     > prefix : <http://example.org>
>     > CONSTRUCT {
>     > ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
>     > }
>     > WHERE {
>     > FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
>     > "str3" || regex(?var4,"pat1"))))
>     > ?var2 :p1 ?var4 ; :p2 ?var3 .
>     > {{
>     > ?var1 :p3 ?var4 .
>     > } UNION {
>     > ?var1 :p4 ?var4 .
>     > }}
>     > }
>     >
>     > Jena-2.13.0 produces algebra:
>     > (prefix ((: <http://example.org>))
>     >   (sequence
>     >     (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
>     >       (sequence
>     >         (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>     > "pat1")))
>     >           (bgp (triple ?var2 :p1 ?var4)))
>     >         (bgp (triple ?var2 :p2 ?var3))))
>     >     (union
>     >       (bgp (triple ?var1 :p3 ?var4))
>     >       (bgp (triple ?var1 :p4 ?var4)))))
>     >
>     > Jena-3.1.0 produces algebra:
>     > (prefix ((: <http://example.org>))
>     >   (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
>     > "pat1")))
>     >     (disjunction
>     >       (assign ((?var3 "str1"))
>     >         (sequence
>     >           (bgp
>     >             (triple ?var2 :p1 ?var4)
>     >             (triple ?var2 :p2 "str1")
>     >           )
>     >           (union
>     >             (bgp (triple ?var1 :p3 ?var4))
>     >             (bgp (triple ?var1 :p4 ?var4)))))
>     >       (assign ((?var3 "str2"))
>     >         (sequence
>     >           (bgp
>     >             (triple ?var2 :p1 ?var4)
>     >             (triple ?var2 :p2 "str2")
>     >           )
>     >           (union
>     >             (bgp (triple ?var1 :p3 ?var4))
>     >             (bgp (triple ?var1 :p4 ?var4))))))))
>     >
>     > Thanks for any insight or assistance into this problem.
>     >
>     > Regards,
>     > --Paul
>     >
>
>
>
>
>

Re: sparql algebra differences jena 2.13.0/3.n

Posted by Rob Vesse <rv...@dotnetrdf.org>.
Andy

 I am a little surprised at the filter equality transformation being applied to this query. Variables in question are only used in the object position so shouldn’t be safe to assume that value equality is equivalent to term equality here. Or is this a special case because of the change in handling of plain literals in RDF 1.1?

Rob

 14/09/2016 10:57, "Andy Seaborne" <an...@apache.org> wrote:

    Hi Paul,
    
    It's difficult to tell what's going on from your report. Plain strings 
    are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have 
    related the data for running Jena 3.x.
    
    On less data, does either case produce the wrong answers?
    
    The regex is not being pushed inwards in the same way which may be an 
    issue - it "all depends" on the data.
    
    A smaller query exhibiting a timing difference would be very helpful. 
    Are all parts of the FILTER necessary for the effect?
    
    	Andy
    
    Unrelated:
    
    {
    ?var1 :p3 ?var4 .
    } UNION {
    ?var1 :p4 ?var4 .
    }
    
    can be written
    
    ?var1 (:p3|:p4) ?var4
    
    
    On 14/09/16 02:01, Paul Tyson wrote:
    > I have some queries that worked fine in jena-2.13.0 but not in
    > jena-3.1.0, using the same data.
    >
    > For a long time I've been running a couple dozen queries regularly over
    > a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
    > to jena-3.1.0, I found that 5 of these queries would not return (ran
    > forever). qparse revealed that the sparql algebra is quite different in
    > 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
    >
    > Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
    > with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
    >
    > prefix : <http://example.org>
    > CONSTRUCT {
    > ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
    > }
    > WHERE {
    > FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
    > "str3" || regex(?var4,"pat1"))))
    > ?var2 :p1 ?var4 ; :p2 ?var3 .
    > {{
    > ?var1 :p3 ?var4 .
    > } UNION {
    > ?var1 :p4 ?var4 .
    > }}
    > }
    >
    > Jena-2.13.0 produces algebra:
    > (prefix ((: <http://example.org>))
    >   (sequence
    >     (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
    >       (sequence
    >         (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
    > "pat1")))
    >           (bgp (triple ?var2 :p1 ?var4)))
    >         (bgp (triple ?var2 :p2 ?var3))))
    >     (union
    >       (bgp (triple ?var1 :p3 ?var4))
    >       (bgp (triple ?var1 :p4 ?var4)))))
    >
    > Jena-3.1.0 produces algebra:
    > (prefix ((: <http://example.org>))
    >   (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
    > "pat1")))
    >     (disjunction
    >       (assign ((?var3 "str1"))
    >         (sequence
    >           (bgp
    >             (triple ?var2 :p1 ?var4)
    >             (triple ?var2 :p2 "str1")
    >           )
    >           (union
    >             (bgp (triple ?var1 :p3 ?var4))
    >             (bgp (triple ?var1 :p4 ?var4)))))
    >       (assign ((?var3 "str2"))
    >         (sequence
    >           (bgp
    >             (triple ?var2 :p1 ?var4)
    >             (triple ?var2 :p2 "str2")
    >           )
    >           (union
    >             (bgp (triple ?var1 :p3 ?var4))
    >             (bgp (triple ?var1 :p4 ?var4))))))))
    >
    > Thanks for any insight or assistance into this problem.
    >
    > Regards,
    > --Paul
    >
    





Re: sparql algebra differences jena 2.13.0/3.n

Posted by Andy Seaborne <an...@apache.org>.
Hi Paul,

It's difficult to tell what's going on from your report. Plain strings 
are not quite identical in RDF 1.0 and RDF 1.1 so I hope you have 
related the data for running Jena 3.x.

On less data, does either case produce the wrong answers?

The regex is not being pushed inwards in the same way which may be an 
issue - it "all depends" on the data.

A smaller query exhibiting a timing difference would be very helpful. 
Are all parts of the FILTER necessary for the effect?

	Andy

Unrelated:

{
?var1 :p3 ?var4 .
} UNION {
?var1 :p4 ?var4 .
}

can be written

?var1 (:p3|:p4) ?var4


On 14/09/16 02:01, Paul Tyson wrote:
> I have some queries that worked fine in jena-2.13.0 but not in
> jena-3.1.0, using the same data.
>
> For a long time I've been running a couple dozen queries regularly over
> a large (900M triples) TDB, using jena-2.13.0. When I recently upgraded
> to jena-3.1.0, I found that 5 of these queries would not return (ran
> forever). qparse revealed that the sparql algebra is quite different in
> 2.13.0 and 3.1.0 (or apparently any 3.n.n version).
>
> Here is a sample query that worked in 2.13.0 but not in 3.1.0, along
> with the algebra given by qparse --explain for 2.13.0 and 3.1.0:
>
> prefix : <http://example.org>
> CONSTRUCT {
> ?var1 <http://www.w3.org/2004/02/skos/core#exactMatch> ?var2 .
> }
> WHERE {
> FILTER (((?var3 = "str1" || ?var3 = "str2") && !(?var4 = "" || ?var4 =
> "str3" || regex(?var4,"pat1"))))
> ?var2 :p1 ?var4 ; :p2 ?var3 .
> {{
> ?var1 :p3 ?var4 .
> } UNION {
> ?var1 :p4 ?var4 .
> }}
> }
>
> Jena-2.13.0 produces algebra:
> (prefix ((: <http://example.org>))
>   (sequence
>     (filter (|| (= ?var3 "str1") (= ?var3 "str2"))
>       (sequence
>         (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
> "pat1")))
>           (bgp (triple ?var2 :p1 ?var4)))
>         (bgp (triple ?var2 :p2 ?var3))))
>     (union
>       (bgp (triple ?var1 :p3 ?var4))
>       (bgp (triple ?var1 :p4 ?var4)))))
>
> Jena-3.1.0 produces algebra:
> (prefix ((: <http://example.org>))
>   (filter (! (|| (|| (= ?var4 "") (= ?var4 "str3")) (regex ?var4
> "pat1")))
>     (disjunction
>       (assign ((?var3 "str1"))
>         (sequence
>           (bgp
>             (triple ?var2 :p1 ?var4)
>             (triple ?var2 :p2 "str1")
>           )
>           (union
>             (bgp (triple ?var1 :p3 ?var4))
>             (bgp (triple ?var1 :p4 ?var4)))))
>       (assign ((?var3 "str2"))
>         (sequence
>           (bgp
>             (triple ?var2 :p1 ?var4)
>             (triple ?var2 :p2 "str2")
>           )
>           (union
>             (bgp (triple ?var1 :p3 ?var4))
>             (bgp (triple ?var1 :p4 ?var4))))))))
>
> Thanks for any insight or assistance into this problem.
>
> Regards,
> --Paul
>