You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Rick Moynihan <ri...@swirrl.com> on 2015/06/15 17:57:03 UTC

Bug Round-tripping HAVING clauses to an SSE and back

Hi all,

I've been using the recent fixes to ARQ (made in JENA-954) around rendering
SPARQL queries and have encountered another problem where a valid query
appears to roundtrip to an invalid one.

The problematic query is this:

SELECT ?obs
WHERE {
  ?obs a qb:Observation ;
         qb:measureType ?measure ;
         ?measure ?value ;
         .
}
GROUP BY ?obs
HAVING (COUNT(?value) > 1)

Which generates this SSE:

(project
  (?obs)
  (filter
    (> ?.0 1)
    (group
      (?obs)
      ((?.0
        (count ?value)))
      (bgp
        (triple ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <
http://purl.org/linked-data/cube#Observation>)
        (triple ?obs <http://purl.org/linked-data/cube#measureType>
?measure)
        (triple ?obs ?measure ?value)))))

But when round tripped back into SPARQL with OpAsQuery.asQuery, leads to
this invalid query:

SELECT  ?obs
WHERE
  { ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> qb:Observation .
    ?obs qb:measureType ?measure .
    ?obs ?measure ?value
    FILTER ( ?.0 > 1 )
  }
GROUP BY ?obs


R.

Re: Bug Round-tripping HAVING clauses to an SSE and back

Posted by Andy Seaborne <an...@apache.org>.
Hi Rick,

I've put in a new OpAsQuery (and new tests); I also ran against the 
SELECT queries in "sparql-corpus".  Because the outcome of OpAsQuery is 
an equivalent query, not the .equals same query, some of the checking 
was manual but it looks good to me.  Currently, as in the old OpAsQuery, 
it puts in {} in patterns quite often to be safe.

I may look at a separate syntax cleaning step to remove unnecessary 
ones; putting it in the translation process seems wrong to me.

The snapshot development is rebuilding at the moment and contains this fix.

It even "optimized" the query in one case (!!!)

SELECT  *
WHERE
   { SELECT DISTINCT  ?uri ?label
     WHERE
       {...}
     ORDER BY ?uri
   }
OFFSET  0
LIMIT   5000

became

SELECT DISTINCT  ?uri ?label
WHERE
     {...}
     ORDER BY ?uri
OFFSET  0
LIMIT   5000

and some other SELECT * WHERE got removed (if there are no modifiers 
like LIMIT it's a no-op and is not visible in the algebra). The 
'optimization' above is to collapse the levels.

(sparql-corpus/cabi/opendatascotland.sparql is a compendium of queries)

	Andy


On 18/06/15 12:30, Rick Moynihan wrote:
> On 17 June 2015 at 14:13, Andy Seaborne <an...@apache.org> wrote:
>
>> On 17/06/15 10:16, Rick Moynihan wrote:
>>
>>> Hi Andy,
>>>
>>> Thanks for raising JENA-963 for me - I'll raise the issues directly in the
>>> future.  Sometimes it's hard to know whether things are intended (or at
>>> least accepted) behaviours though.
>>>
>>
>> Point taken. It's those unfunded volunteers - can't rely on them!
>> The project takes whatever channels work; we're not, I hope, dogmatic.
>> When stuff gets detailed, email isn't so good, whether basic formatting
>> stuff or just as a record over time, JIRA is better, at least I find so.
>> Helps people see what they can contribute as well.
>>
>
> I completely agree about your channels point.  Which is precisely why I'll
> often go to the mailing list before the bug tracker.  If you're unsure
> about the behaviour, or whether its a bug you can get quicker feedback by
> going to the mailing list first, and when you're satisfied its a bug;
> filing it.
>
> Regardless, I think 963 was clearly a bug, and I should have directly filed
> it for you in JIRA, and will do in the future.
>
>   Unfortunately I haven't got an exhaustive set of queries we need to
>>> support; but we're basically hoping to have all arbitrary SPARQL 1.1
>>> queries round-trip back to a query which is at least equivalent when
>>> evaluated on any complaint SPARQL 1.1 database to what went in.
>>>
>>> Most of the problems I've run into have been uncovered either by using it,
>>> writing unit tests for my domain code, by integration testing with some of
>>> our other components, or in this particular case by a colleague trying to
>>> generate some stats on data we have.
>>>
>>> Would every example query from the SPARQL 1.1 spec be a good start?
>>>
>>> http://www.w3.org/TR/sparql11-query/
>>>
>>> I also have a small collection of about 28 different real world queries
>>> (mostly for handling RDF data cubes) which were generated via some of our
>>> systems that may be useful.  If you'd like me to provide them as potential
>>> test cases I'm sure I can do that.
>>>
>>
>> That would be great.
>>
>
> Ok, I'm not sure how useful these will be for this bug, but I've created a
> repo with 56 real world SPARQL queries (no data), which you're more than
> welcome to use as you please.
>
> I've licensed the repo as MIT, which I think should work with Apache; but
> I'm happy to grant you an Apache license to the queries as they are too.
> Many of the queries were auto generated, so might not be what a user would
> write.
>
> https://github.com/Swirrl/sparql-corpus
>
> Let me know if you need anything else.
>
> I've done some analysis on JENA-963 and written in the cases I think turn
>> out for GROUP BY and it woudl be good to validate that analysis with real
>> world queries of interest.
>>
>
> Ok, there happen to be 11 real world GROUP BY queries in that repo:
>
> 12:07 $ git grep GROUP
> cabi/cabi-calculate-level.sparql:} GROUP BY ?leafConcept ?topConcept
> cabi/cabi-count-documents-countries.sparql:} GROUP BY ?countable
> ?countryLabel LIMIT 10 OFFSET 0
> cabi/cabi-count-documents-regions.sparql:} GROUP BY ?countable ?countableId
> ?countableName
> cabi/cabi-count-documents-themes.sparql:} GROUP BY ?countable ?countableId
> cabi/cabi-graphs.sparql:} GROUP BY ?o ?g
> cabi/cabi-research-outputs.sparql:    } GROUP BY ?resource ?title
> ?projectUri ?outputTitle ?outputDate
> cabi/opendatascotland.sparql:} GROUP BY
> cabi/spog.sparql:} GROUP BY ?g
> cabi/test-sparql.sparql:     } GROUP BY ?resource ?title ?projectUri
> ?projectId ?outputTitle ?outputDate
> cabi/test.sparql:} GROUP BY ?countable ?countableId ?countableName LIMIT 10
> OFFSET 10
> pmd/dataset_period_row_labels.sparql:            GROUP BY ?row
>
>
>> It looks to me like the top-down visit-driven translation is good for the
>> WHERE{} part of the algebra to query but spotting group, and all it's
>> details, is more of a pattern matching task.  In fact, having pattern
>> matching for the parts outside WHERE{}, all the modifiers in SPARQL, looks
>> good.
>>
>> Algebra that is not in the shape originally generated by the query needs
>> to be factored in (not that the contract of OpAsQuery can promise
>> perfection there), it's just that, my guess, algebra-like-queries is the
>> major use case.
>>
>
> I can't say I understand all the details here, but it sounds good.  If you
> let me know when the code lands in a SNAPSHOT jar, I'll happily integrate
> it with our stuff and see if anything else falls out.
>
>
>> (Yes, clojure would be perfect for this!)
>>
>
> It's funny you should say that!  Our systems are actually written in
> Clojure, and rather than make use of the visitors JENA provides - I wrote a
> small functional zipper in just 9 lines of clojure.zip that you can use to
> trivially traverse SSE trees in a few lines.  Obviously from a clojure
> perspective it would be better if SSE items, lists and nodes were actually
> clojure data - but the SSE idea made the whole thing a joy.  Bravo!
>
> R.
>
>
>> On 15 June 2015 at 18:31, Andy Seaborne <an...@apache.org> wrote:
>>>
>>>   Hi Rick,
>>>>
>>>> Sorry, your not having a good time of it here.
>>>>
>>>> Not one but 2 related bugs (filter in wrong place, lost the aggregate
>>>> function) this time.  HAVING is particularly hard because it isn't a
>>>> simple
>>>> mapping to one algebra form.
>>>>
>>>> If split up:
>>>> --------------
>>>> PREFIX  qb:   <http://purl.org/linked-data/cube#>
>>>>
>>>> SELECT  ?obs (COUNT(?value) AS ?C)
>>>> WHERE
>>>>     { ?obs a qb:Observation .
>>>>       ?obs qb:measureType ?measure .
>>>>       ?obs ?measure ?value
>>>>     }
>>>> GROUP BY ?obs
>>>> HAVING ( ?C > 1 )
>>>> --------------
>>>> it goes wrong as well.
>>>>
>>>> I've recorded it as
>>>>
>>>> https://issues.apache.org/jira/browse/JENA-963
>>>>
>>>> A couple of things would be good:
>>>>
>>>> You can raise JIRA directly - I attached code to the JIRA like it was
>>>> from
>>>> JENA-954.  Prefixes etc. - query-in, query-out.
>>>>
>>>> What would be really good is fix the test coverage.  "TestOpAsQuery" is
>>>> the test class. Do you have a complete (nearly complete ...) list of
>>>> features? What's missing in TestOpAsQuery?
>>>>
>>>> If we can get the coverage up, we'll be a better position long term.
>>>>
>>>>           Andy
>>>>
>>>>
>>>> On 15/06/15 16:57, Rick Moynihan wrote:
>>>>
>>>>   Hi all,
>>>>>
>>>>> I've been using the recent fixes to ARQ (made in JENA-954) around
>>>>> rendering
>>>>> SPARQL queries and have encountered another problem where a valid query
>>>>> appears to roundtrip to an invalid one.
>>>>>
>>>>> The problematic query is this:
>>>>>
>>>>> SELECT ?obs
>>>>> WHERE {
>>>>>      ?obs a qb:Observation ;
>>>>>             qb:measureType ?measure ;
>>>>>             ?measure ?value ;
>>>>>             .
>>>>> }
>>>>> GROUP BY ?obs
>>>>> HAVING (COUNT(?value) > 1)
>>>>>
>>>>> Which generates this SSE:
>>>>>
>>>>> (project
>>>>>      (?obs)
>>>>>      (filter
>>>>>        (> ?.0 1)
>>>>>        (group
>>>>>          (?obs)
>>>>>          ((?.0
>>>>>            (count ?value)))
>>>>>          (bgp
>>>>>            (triple ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> <
>>>>> http://purl.org/linked-data/cube#Observation>)
>>>>>            (triple ?obs <http://purl.org/linked-data/cube#measureType>
>>>>> ?measure)
>>>>>            (triple ?obs ?measure ?value)))))
>>>>>
>>>>> But when round tripped back into SPARQL with OpAsQuery.asQuery, leads to
>>>>> this invalid query:
>>>>>
>>>>> SELECT  ?obs
>>>>> WHERE
>>>>>      { ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>>> qb:Observation .
>>>>>        ?obs qb:measureType ?measure .
>>>>>        ?obs ?measure ?value
>>>>>        FILTER ( ?.0 > 1 )
>>>>>      }
>>>>> GROUP BY ?obs
>>>>>
>>>>>
>>>>> R.
>>>>>
>>>>>
>>>>>
>>>
>>
>


Re: Bug Round-tripping HAVING clauses to an SSE and back

Posted by Rick Moynihan <ri...@swirrl.com>.
On 17 June 2015 at 14:13, Andy Seaborne <an...@apache.org> wrote:

> On 17/06/15 10:16, Rick Moynihan wrote:
>
>> Hi Andy,
>>
>> Thanks for raising JENA-963 for me - I'll raise the issues directly in the
>> future.  Sometimes it's hard to know whether things are intended (or at
>> least accepted) behaviours though.
>>
>
> Point taken. It's those unfunded volunteers - can't rely on them!
> The project takes whatever channels work; we're not, I hope, dogmatic.
> When stuff gets detailed, email isn't so good, whether basic formatting
> stuff or just as a record over time, JIRA is better, at least I find so.
> Helps people see what they can contribute as well.
>

I completely agree about your channels point.  Which is precisely why I'll
often go to the mailing list before the bug tracker.  If you're unsure
about the behaviour, or whether its a bug you can get quicker feedback by
going to the mailing list first, and when you're satisfied its a bug;
filing it.

Regardless, I think 963 was clearly a bug, and I should have directly filed
it for you in JIRA, and will do in the future.

 Unfortunately I haven't got an exhaustive set of queries we need to
>> support; but we're basically hoping to have all arbitrary SPARQL 1.1
>> queries round-trip back to a query which is at least equivalent when
>> evaluated on any complaint SPARQL 1.1 database to what went in.
>>
>> Most of the problems I've run into have been uncovered either by using it,
>> writing unit tests for my domain code, by integration testing with some of
>> our other components, or in this particular case by a colleague trying to
>> generate some stats on data we have.
>>
>> Would every example query from the SPARQL 1.1 spec be a good start?
>>
>> http://www.w3.org/TR/sparql11-query/
>>
>> I also have a small collection of about 28 different real world queries
>> (mostly for handling RDF data cubes) which were generated via some of our
>> systems that may be useful.  If you'd like me to provide them as potential
>> test cases I'm sure I can do that.
>>
>
> That would be great.
>

Ok, I'm not sure how useful these will be for this bug, but I've created a
repo with 56 real world SPARQL queries (no data), which you're more than
welcome to use as you please.

I've licensed the repo as MIT, which I think should work with Apache; but
I'm happy to grant you an Apache license to the queries as they are too.
Many of the queries were auto generated, so might not be what a user would
write.

https://github.com/Swirrl/sparql-corpus

Let me know if you need anything else.

I've done some analysis on JENA-963 and written in the cases I think turn
> out for GROUP BY and it woudl be good to validate that analysis with real
> world queries of interest.
>

Ok, there happen to be 11 real world GROUP BY queries in that repo:

12:07 $ git grep GROUP
cabi/cabi-calculate-level.sparql:} GROUP BY ?leafConcept ?topConcept
cabi/cabi-count-documents-countries.sparql:} GROUP BY ?countable
?countryLabel LIMIT 10 OFFSET 0
cabi/cabi-count-documents-regions.sparql:} GROUP BY ?countable ?countableId
?countableName
cabi/cabi-count-documents-themes.sparql:} GROUP BY ?countable ?countableId
cabi/cabi-graphs.sparql:} GROUP BY ?o ?g
cabi/cabi-research-outputs.sparql:    } GROUP BY ?resource ?title
?projectUri ?outputTitle ?outputDate
cabi/opendatascotland.sparql:} GROUP BY
cabi/spog.sparql:} GROUP BY ?g
cabi/test-sparql.sparql:     } GROUP BY ?resource ?title ?projectUri
?projectId ?outputTitle ?outputDate
cabi/test.sparql:} GROUP BY ?countable ?countableId ?countableName LIMIT 10
OFFSET 10
pmd/dataset_period_row_labels.sparql:            GROUP BY ?row


> It looks to me like the top-down visit-driven translation is good for the
> WHERE{} part of the algebra to query but spotting group, and all it's
> details, is more of a pattern matching task.  In fact, having pattern
> matching for the parts outside WHERE{}, all the modifiers in SPARQL, looks
> good.
>
> Algebra that is not in the shape originally generated by the query needs
> to be factored in (not that the contract of OpAsQuery can promise
> perfection there), it's just that, my guess, algebra-like-queries is the
> major use case.
>

I can't say I understand all the details here, but it sounds good.  If you
let me know when the code lands in a SNAPSHOT jar, I'll happily integrate
it with our stuff and see if anything else falls out.


> (Yes, clojure would be perfect for this!)
>

It's funny you should say that!  Our systems are actually written in
Clojure, and rather than make use of the visitors JENA provides - I wrote a
small functional zipper in just 9 lines of clojure.zip that you can use to
trivially traverse SSE trees in a few lines.  Obviously from a clojure
perspective it would be better if SSE items, lists and nodes were actually
clojure data - but the SSE idea made the whole thing a joy.  Bravo!

R.


> On 15 June 2015 at 18:31, Andy Seaborne <an...@apache.org> wrote:
>>
>>  Hi Rick,
>>>
>>> Sorry, your not having a good time of it here.
>>>
>>> Not one but 2 related bugs (filter in wrong place, lost the aggregate
>>> function) this time.  HAVING is particularly hard because it isn't a
>>> simple
>>> mapping to one algebra form.
>>>
>>> If split up:
>>> --------------
>>> PREFIX  qb:   <http://purl.org/linked-data/cube#>
>>>
>>> SELECT  ?obs (COUNT(?value) AS ?C)
>>> WHERE
>>>    { ?obs a qb:Observation .
>>>      ?obs qb:measureType ?measure .
>>>      ?obs ?measure ?value
>>>    }
>>> GROUP BY ?obs
>>> HAVING ( ?C > 1 )
>>> --------------
>>> it goes wrong as well.
>>>
>>> I've recorded it as
>>>
>>> https://issues.apache.org/jira/browse/JENA-963
>>>
>>> A couple of things would be good:
>>>
>>> You can raise JIRA directly - I attached code to the JIRA like it was
>>> from
>>> JENA-954.  Prefixes etc. - query-in, query-out.
>>>
>>> What would be really good is fix the test coverage.  "TestOpAsQuery" is
>>> the test class. Do you have a complete (nearly complete ...) list of
>>> features? What's missing in TestOpAsQuery?
>>>
>>> If we can get the coverage up, we'll be a better position long term.
>>>
>>>          Andy
>>>
>>>
>>> On 15/06/15 16:57, Rick Moynihan wrote:
>>>
>>>  Hi all,
>>>>
>>>> I've been using the recent fixes to ARQ (made in JENA-954) around
>>>> rendering
>>>> SPARQL queries and have encountered another problem where a valid query
>>>> appears to roundtrip to an invalid one.
>>>>
>>>> The problematic query is this:
>>>>
>>>> SELECT ?obs
>>>> WHERE {
>>>>     ?obs a qb:Observation ;
>>>>            qb:measureType ?measure ;
>>>>            ?measure ?value ;
>>>>            .
>>>> }
>>>> GROUP BY ?obs
>>>> HAVING (COUNT(?value) > 1)
>>>>
>>>> Which generates this SSE:
>>>>
>>>> (project
>>>>     (?obs)
>>>>     (filter
>>>>       (> ?.0 1)
>>>>       (group
>>>>         (?obs)
>>>>         ((?.0
>>>>           (count ?value)))
>>>>         (bgp
>>>>           (triple ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> <
>>>> http://purl.org/linked-data/cube#Observation>)
>>>>           (triple ?obs <http://purl.org/linked-data/cube#measureType>
>>>> ?measure)
>>>>           (triple ?obs ?measure ?value)))))
>>>>
>>>> But when round tripped back into SPARQL with OpAsQuery.asQuery, leads to
>>>> this invalid query:
>>>>
>>>> SELECT  ?obs
>>>> WHERE
>>>>     { ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>>> qb:Observation .
>>>>       ?obs qb:measureType ?measure .
>>>>       ?obs ?measure ?value
>>>>       FILTER ( ?.0 > 1 )
>>>>     }
>>>> GROUP BY ?obs
>>>>
>>>>
>>>> R.
>>>>
>>>>
>>>>
>>
>

Re: Bug Round-tripping HAVING clauses to an SSE and back

Posted by Andy Seaborne <an...@apache.org>.
On 17/06/15 10:16, Rick Moynihan wrote:
> Hi Andy,
>
> Thanks for raising JENA-963 for me - I'll raise the issues directly in the
> future.  Sometimes it's hard to know whether things are intended (or at
> least accepted) behaviours though.

Point taken. It's those unfunded volunteers - can't rely on them!

The project takes whatever channels work; we're not, I hope, dogmatic. 
When stuff gets detailed, email isn't so good, whether basic formatting 
stuff or just as a record over time, JIRA is better, at least I find so. 
  Helps people see what they can contribute as well.

Sometimes the channel is hard to deal with (twitter? for detailed 
questions?!!!?)

> Unfortunately I haven't got an exhaustive set of queries we need to
> support; but we're basically hoping to have all arbitrary SPARQL 1.1
> queries round-trip back to a query which is at least equivalent when
> evaluated on any complaint SPARQL 1.1 database to what went in.
>
> Most of the problems I've run into have been uncovered either by using it,
> writing unit tests for my domain code, by integration testing with some of
> our other components, or in this particular case by a colleague trying to
> generate some stats on data we have.
>
> Would every example query from the SPARQL 1.1 spec be a good start?
>
> http://www.w3.org/TR/sparql11-query/
>
> I also have a small collection of about 28 different real world queries
> (mostly for handling RDF data cubes) which were generated via some of our
> systems that may be useful.  If you'd like me to provide them as potential
> test cases I'm sure I can do that.

That would be great.

I've done some analysis on JENA-963 and written in the cases I think 
turn out for GROUP BY and it woudl be good to validate that analysis 
with real world queries of interest.

It looks to me like the top-down visit-driven translation is good for 
the WHERE{} part of the algebra to query but spotting group, and all 
it's details, is more of a pattern matching task.  In fact, having 
pattern matching for the parts outside WHERE{}, all the modifiers in 
SPARQL, looks good.

Algebra that is not in the shape originally generated by the query needs 
to be factored in (not that the contract of OpAsQuery can promise 
perfection there), it's just that, my guess, algebra-like-queries is the 
major use case.

(Yes, clojure would be perfect for this!)

	Andy


>
> R.
>
> On 15 June 2015 at 18:31, Andy Seaborne <an...@apache.org> wrote:
>
>> Hi Rick,
>>
>> Sorry, your not having a good time of it here.
>>
>> Not one but 2 related bugs (filter in wrong place, lost the aggregate
>> function) this time.  HAVING is particularly hard because it isn't a simple
>> mapping to one algebra form.
>>
>> If split up:
>> --------------
>> PREFIX  qb:   <http://purl.org/linked-data/cube#>
>>
>> SELECT  ?obs (COUNT(?value) AS ?C)
>> WHERE
>>    { ?obs a qb:Observation .
>>      ?obs qb:measureType ?measure .
>>      ?obs ?measure ?value
>>    }
>> GROUP BY ?obs
>> HAVING ( ?C > 1 )
>> --------------
>> it goes wrong as well.
>>
>> I've recorded it as
>>
>> https://issues.apache.org/jira/browse/JENA-963
>>
>> A couple of things would be good:
>>
>> You can raise JIRA directly - I attached code to the JIRA like it was from
>> JENA-954.  Prefixes etc. - query-in, query-out.
>>
>> What would be really good is fix the test coverage.  "TestOpAsQuery" is
>> the test class. Do you have a complete (nearly complete ...) list of
>> features? What's missing in TestOpAsQuery?
>>
>> If we can get the coverage up, we'll be a better position long term.
>>
>>          Andy
>>
>>
>> On 15/06/15 16:57, Rick Moynihan wrote:
>>
>>> Hi all,
>>>
>>> I've been using the recent fixes to ARQ (made in JENA-954) around
>>> rendering
>>> SPARQL queries and have encountered another problem where a valid query
>>> appears to roundtrip to an invalid one.
>>>
>>> The problematic query is this:
>>>
>>> SELECT ?obs
>>> WHERE {
>>>     ?obs a qb:Observation ;
>>>            qb:measureType ?measure ;
>>>            ?measure ?value ;
>>>            .
>>> }
>>> GROUP BY ?obs
>>> HAVING (COUNT(?value) > 1)
>>>
>>> Which generates this SSE:
>>>
>>> (project
>>>     (?obs)
>>>     (filter
>>>       (> ?.0 1)
>>>       (group
>>>         (?obs)
>>>         ((?.0
>>>           (count ?value)))
>>>         (bgp
>>>           (triple ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <
>>> http://purl.org/linked-data/cube#Observation>)
>>>           (triple ?obs <http://purl.org/linked-data/cube#measureType>
>>> ?measure)
>>>           (triple ?obs ?measure ?value)))))
>>>
>>> But when round tripped back into SPARQL with OpAsQuery.asQuery, leads to
>>> this invalid query:
>>>
>>> SELECT  ?obs
>>> WHERE
>>>     { ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>>> qb:Observation .
>>>       ?obs qb:measureType ?measure .
>>>       ?obs ?measure ?value
>>>       FILTER ( ?.0 > 1 )
>>>     }
>>> GROUP BY ?obs
>>>
>>>
>>> R.
>>>
>>>
>


Re: Bug Round-tripping HAVING clauses to an SSE and back

Posted by Rick Moynihan <ri...@swirrl.com>.
Hi Andy,

Thanks for raising JENA-963 for me - I'll raise the issues directly in the
future.  Sometimes it's hard to know whether things are intended (or at
least accepted) behaviours though.

Unfortunately I haven't got an exhaustive set of queries we need to
support; but we're basically hoping to have all arbitrary SPARQL 1.1
queries round-trip back to a query which is at least equivalent when
evaluated on any complaint SPARQL 1.1 database to what went in.

Most of the problems I've run into have been uncovered either by using it,
writing unit tests for my domain code, by integration testing with some of
our other components, or in this particular case by a colleague trying to
generate some stats on data we have.

Would every example query from the SPARQL 1.1 spec be a good start?

http://www.w3.org/TR/sparql11-query/

I also have a small collection of about 28 different real world queries
(mostly for handling RDF data cubes) which were generated via some of our
systems that may be useful.  If you'd like me to provide them as potential
test cases I'm sure I can do that.

R.

On 15 June 2015 at 18:31, Andy Seaborne <an...@apache.org> wrote:

> Hi Rick,
>
> Sorry, your not having a good time of it here.
>
> Not one but 2 related bugs (filter in wrong place, lost the aggregate
> function) this time.  HAVING is particularly hard because it isn't a simple
> mapping to one algebra form.
>
> If split up:
> --------------
> PREFIX  qb:   <http://purl.org/linked-data/cube#>
>
> SELECT  ?obs (COUNT(?value) AS ?C)
> WHERE
>   { ?obs a qb:Observation .
>     ?obs qb:measureType ?measure .
>     ?obs ?measure ?value
>   }
> GROUP BY ?obs
> HAVING ( ?C > 1 )
> --------------
> it goes wrong as well.
>
> I've recorded it as
>
> https://issues.apache.org/jira/browse/JENA-963
>
> A couple of things would be good:
>
> You can raise JIRA directly - I attached code to the JIRA like it was from
> JENA-954.  Prefixes etc. - query-in, query-out.
>
> What would be really good is fix the test coverage.  "TestOpAsQuery" is
> the test class. Do you have a complete (nearly complete ...) list of
> features? What's missing in TestOpAsQuery?
>
> If we can get the coverage up, we'll be a better position long term.
>
>         Andy
>
>
> On 15/06/15 16:57, Rick Moynihan wrote:
>
>> Hi all,
>>
>> I've been using the recent fixes to ARQ (made in JENA-954) around
>> rendering
>> SPARQL queries and have encountered another problem where a valid query
>> appears to roundtrip to an invalid one.
>>
>> The problematic query is this:
>>
>> SELECT ?obs
>> WHERE {
>>    ?obs a qb:Observation ;
>>           qb:measureType ?measure ;
>>           ?measure ?value ;
>>           .
>> }
>> GROUP BY ?obs
>> HAVING (COUNT(?value) > 1)
>>
>> Which generates this SSE:
>>
>> (project
>>    (?obs)
>>    (filter
>>      (> ?.0 1)
>>      (group
>>        (?obs)
>>        ((?.0
>>          (count ?value)))
>>        (bgp
>>          (triple ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <
>> http://purl.org/linked-data/cube#Observation>)
>>          (triple ?obs <http://purl.org/linked-data/cube#measureType>
>> ?measure)
>>          (triple ?obs ?measure ?value)))))
>>
>> But when round tripped back into SPARQL with OpAsQuery.asQuery, leads to
>> this invalid query:
>>
>> SELECT  ?obs
>> WHERE
>>    { ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> qb:Observation .
>>      ?obs qb:measureType ?measure .
>>      ?obs ?measure ?value
>>      FILTER ( ?.0 > 1 )
>>    }
>> GROUP BY ?obs
>>
>>
>> R.
>>
>>

Re: Bug Round-tripping HAVING clauses to an SSE and back

Posted by Andy Seaborne <an...@apache.org>.
Hi Rick,

Sorry, your not having a good time of it here.

Not one but 2 related bugs (filter in wrong place, lost the aggregate 
function) this time.  HAVING is particularly hard because it isn't a 
simple mapping to one algebra form.

If split up:
--------------
PREFIX  qb:   <http://purl.org/linked-data/cube#>

SELECT  ?obs (COUNT(?value) AS ?C)
WHERE
   { ?obs a qb:Observation .
     ?obs qb:measureType ?measure .
     ?obs ?measure ?value
   }
GROUP BY ?obs
HAVING ( ?C > 1 )
--------------
it goes wrong as well.

I've recorded it as

https://issues.apache.org/jira/browse/JENA-963

A couple of things would be good:

You can raise JIRA directly - I attached code to the JIRA like it was 
from JENA-954.  Prefixes etc. - query-in, query-out.

What would be really good is fix the test coverage.  "TestOpAsQuery" is 
the test class. Do you have a complete (nearly complete ...) list of 
features? What's missing in TestOpAsQuery?

If we can get the coverage up, we'll be a better position long term.

	Andy

On 15/06/15 16:57, Rick Moynihan wrote:
> Hi all,
>
> I've been using the recent fixes to ARQ (made in JENA-954) around rendering
> SPARQL queries and have encountered another problem where a valid query
> appears to roundtrip to an invalid one.
>
> The problematic query is this:
>
> SELECT ?obs
> WHERE {
>    ?obs a qb:Observation ;
>           qb:measureType ?measure ;
>           ?measure ?value ;
>           .
> }
> GROUP BY ?obs
> HAVING (COUNT(?value) > 1)
>
> Which generates this SSE:
>
> (project
>    (?obs)
>    (filter
>      (> ?.0 1)
>      (group
>        (?obs)
>        ((?.0
>          (count ?value)))
>        (bgp
>          (triple ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <
> http://purl.org/linked-data/cube#Observation>)
>          (triple ?obs <http://purl.org/linked-data/cube#measureType>
> ?measure)
>          (triple ?obs ?measure ?value)))))
>
> But when round tripped back into SPARQL with OpAsQuery.asQuery, leads to
> this invalid query:
>
> SELECT  ?obs
> WHERE
>    { ?obs <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> qb:Observation .
>      ?obs qb:measureType ?measure .
>      ?obs ?measure ?value
>      FILTER ( ?.0 > 1 )
>    }
> GROUP BY ?obs
>
>
> R.
>