You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Andy Seaborne <an...@apache.org> on 2017/11/04 11:18:40 UTC

SPARQL Aggregation : spec and implementation

Following on from the thread in October "Problem with MAX when no result 
expected" <https://s.apache.org/03Hz> here is an investigation into 
SPARQL Aggregation, both what the spec says and what ARQ does.

It would be interesting to get reports about other implementations for 
these two queries:

Q1: SELECT (COUNT(*) AS ?C1) { ?s ?p ?o FILTER(false) }
Q2: SELECT (COUNT(*) AS ?C2) { ?s ?p ?o FILTER(false) } GROUP BY ?s

It doesn't matter what the data is - the WHERE {} does not match 
anything. Or use Q1,Q2 below with an empty dataset.

== tl;dr

I think that ARQ follows the spec but the spec is not what you might 
expect from an SQL background.

==

The original question rose from the case of no matches to the WHERE clause.

* What happens when there is no GROUP BY
* What happens when there is a GROUP BY

Q1: SELECT (COUNT(*) AS ?C1) { ?s ?p ?o }
Q2: SELECT (COUNT(*) AS ?C2) { ?s ?p ?o } GROUP BY ?s

with no matches to the graph pattern  { ?s ?p ?o }.
(beware below! {} are being used in different ways!)

== The SPARQL 1.1 specification

In the definition of "Group",
https://www.w3.org/TR/sparql11-query/#defn_algGroup

If the pattern does not match, then Group(exprlist, Ω) is the empty set ∅.

ListEval(exprlist, μ) is the GROUP BY key
   μ is a row ("solution mapping" in the spec).

ListEval(exprlist, μ) → { μ' ... } is the grouping of rows by GROUP BY. 
For each group, there is a set of rows. It's a map from key to set of rows.

If there are no matches, so no μ, there are no entries in this map. When 
there are no matches to the pattern "WHERE { ?s ?p ?o }", so no μ, there 
are no entries in this map; { μ' ... } is the empty set ∅.

 From here, the rest of the definitions simplify down to empty.

Aggregation(exprlist, func, scalarvals, ∅) is ∅.
AggregateJoin(∅) = ∅.
Flatten(∅) = ∅.

COUNT(∅) = 0 (xsd:integer zero)
SUM(∅) = 0
AVG, MIN, MAX, SAMPLE are errors
GROUP_CONCAT = ""

Throughout this, it does not matter if there is a GROUP BY clause or not 
because "Group" is the empty set ∅ in both cases.

This may be surprising if you are used to SQL because in SQL with GROUP 
BY you get no rows, and COUNT is never zero, whereas with no GROUP BY, 
you get one row and a COUNT of zero.

== Results from ARQ

Q1: no matches:
------
| C1 |
======
| 0  |
------
Q2: no matches:
------
| C2 |
======
| 0  |
------

Re: SPARQL Aggregation : spec and implementation

Posted by james anderson <ja...@dydra.com>.
good evening,

> On 2017-11-04, at 17:34, Andy Seaborne <an...@apache.org> wrote:
> 
> 
> 
> On 04/11/17 13:46, james anderson wrote:
>> good afternoon;
>>> On 2017-11-04, at 12:18, Andy Seaborne <an...@apache.org> wrote:
>>> 
>>> Following on from the thread in October "Problem with MAX when no result expected" <https://s.apache.org/03Hz> here is an investigation into SPARQL Aggregation, both what the spec says and what ARQ does.
>>> 
>>> It would be interesting to get reports about other implementations for these two queries:
>>> 
>>> Q1: SELECT (COUNT(*) AS ?C1) { ?s ?p ?o FILTER(false) }
>>> Q2: SELECT (COUNT(*) AS ?C2) { ?s ?p ?o FILTER(false) } GROUP BY ?s
> 
> For clarity:
> 
>> http://dydra.com/jhacker/foaf/@query#q1 : 0
> 
> Is that zero rows or one row with single columns with C1 being value 0?
> 
>> http://dydra.com/jhacker/foaf/@query#q2 : no result
> 
> Is that zero rows or execution error/illegal query?

the urls locate runnable queries.
if you hit the “run” button, the answers to those questions should appear.

> 
>>> Throughout this, it does not matter if there is a GROUP BY clause or not because "Group" is the empty set ∅ in both cases.
>> there is a reading of the recommendation text according to which the two sets which are empty are not of the same kind.
>> in one case, the set is that over which the aggregation operations run and in that case, the result is 0.
>> in the other case it is the set of groups, in which case, there is no set of solutions over which to run the aggregation and therefore there is no result.
> 
> Maybe, maybe not. Please quote the text that gives the alternative reading.  I don't see reading that mentions kinds.
> 
> I missed the fact that when there is no GROUP BY the group key is some constant like 1.
> 
> But that does not change anything because Group() is defined in a way that uses the rows matching WHERE{} in a foreach and foreach of Ω = empty set is the empty set.
> 
> Group(exprlist, Ω) = { ListEval(exprlist, μ) -> ... | μ in Ω }
> 
> so if there are no μ, ListEval is not evaluated and Group() is the empty set. There is only one empty set. It does not carry any indication of use of GROUP BY or not.

yes, but it is not a set of solutions, it is a set of keys which are to be used to partition solutions.

> 
> By the end of Group() the situation is the same

as what?

> so the result will be the same for what ever reading of the rest of the process (there are some parts I'm not completely confident with yet).

as is evident from my comment above, i do not read the document in a way which agrees with this claim.

> 
> Specifically, for no GROUP BY the result is not a map of the constant to the empty multiset {1 -> {}}

agreed, it is a set of solutions.
which may be empty, bit is not the same this as an empty set of mappings from keys to sets of solutions.

> 
>> is there some way to read the text such that the empty set of groups transforms into an empty set of solutions?
> 
> Not that I can see; it should be covered by {1 -> {}} vs {} butthat distinction is lost at Group().

i read the passage which describes aggregation under 18.5

    Aggregation(exprlist, func, scalarvals, { key1→Ω1, ..., keym→Ωm } ) = { (key, F(Ω)) | key → Ω in { key1→Ω1, ..., keym→Ωm } }

to reduce, in the case where no key is generated, to

    Aggregation(exprlist, func, scalarvals, { } )

how else is that to be read?

best regards, from berlin,


---
james anderson | james@dydra.com | http://dydra.com






Re: SPARQL Aggregation : spec and implementation

Posted by Andy Seaborne <an...@apache.org>.

On 04/11/17 13:46, james anderson wrote:
> good afternoon;
> 
>> On 2017-11-04, at 12:18, Andy Seaborne <an...@apache.org> wrote:
>>
>> Following on from the thread in October "Problem with MAX when no result expected" <https://s.apache.org/03Hz> here is an investigation into SPARQL Aggregation, both what the spec says and what ARQ does.
>>
>> It would be interesting to get reports about other implementations for these two queries:
>>
>> Q1: SELECT (COUNT(*) AS ?C1) { ?s ?p ?o FILTER(false) }
>> Q2: SELECT (COUNT(*) AS ?C2) { ?s ?p ?o FILTER(false) } GROUP BY ?s
> 

For clarity:

> http://dydra.com/jhacker/foaf/@query#q1 : 0

Is that zero rows or one row with single columns with C1 being value 0?

> http://dydra.com/jhacker/foaf/@query#q2 : no result

Is that zero rows or execution error/illegal query?

>> Throughout this, it does not matter if there is a GROUP BY clause or not because "Group" is the empty set ∅ in both cases.
> 
> there is a reading of the recommendation text according to which the two sets which are empty are not of the same kind.
> in one case, the set is that over which the aggregation operations run and in that case, the result is 0.
> in the other case it is the set of groups, in which case, there is no set of solutions over which to run the aggregation and therefore there is no result.

Maybe, maybe not. Please quote the text that gives the alternative 
reading.  I don't see reading that mentions kinds.

I missed the fact that when there is no GROUP BY the group key is some 
constant like 1.

But that does not change anything because Group() is defined in a way 
that uses the rows matching WHERE{} in a foreach and foreach of Ω = 
empty set is the empty set.

Group(exprlist, Ω) = { ListEval(exprlist, μ) -> ... | μ in Ω }

so if there are no μ, ListEval is not evaluated and Group() is the empty 
set. There is only one empty set. It does not carry any indication of 
use of GROUP BY or not.

By the end of Group() the situation is the same so the result will be 
the same for what ever reading of the rest of the process (there are 
some parts I'm not completely confident with yet).

Specifically, for no GROUP BY the result is not a map of the constant to 
the empty multiset {1 -> {}}

> is there some way to read the text such that the empty set of groups transforms into an empty set of solutions?

Not that I can see; it should be covered by {1 -> {}} vs {} butthat 
distinction is lost at Group().

	Andy


> 
> 
> 
> ---
> james anderson | james@dydra.com | http://dydra.com
> 
> 
> 
> 
> 

Re: SPARQL Aggregation : spec and implementation

Posted by james anderson <ja...@dydra.com>.
good afternoon;

> On 2017-11-04, at 12:18, Andy Seaborne <an...@apache.org> wrote:
> 
> Following on from the thread in October "Problem with MAX when no result expected" <https://s.apache.org/03Hz> here is an investigation into SPARQL Aggregation, both what the spec says and what ARQ does.
> 
> It would be interesting to get reports about other implementations for these two queries:
> 
> Q1: SELECT (COUNT(*) AS ?C1) { ?s ?p ?o FILTER(false) }
> Q2: SELECT (COUNT(*) AS ?C2) { ?s ?p ?o FILTER(false) } GROUP BY ?s

http://dydra.com/jhacker/foaf/@query#q1 : 0
http://dydra.com/jhacker/foaf/@query#q2 : no result

> 
> It doesn't matter what the data is - the WHERE {} does not match anything. Or use Q1,Q2 below with an empty dataset.
> 
> == tl;dr
> 
> I think that ARQ follows the spec but the spec is not what you might expect from an SQL background.

i have no background implementing sql.

> 
> ==
> 
> The original question rose from the case of no matches to the WHERE clause.
> 
> * What happens when there is no GROUP BY
> * What happens when there is a GROUP BY
> 
> Q1: SELECT (COUNT(*) AS ?C1) { ?s ?p ?o }
> Q2: SELECT (COUNT(*) AS ?C2) { ?s ?p ?o } GROUP BY ?s
> 
> with no matches to the graph pattern  { ?s ?p ?o }.
> (beware below! {} are being used in different ways!)
> 
> == The SPARQL 1.1 specification
> 
> In the definition of "Group",
> https://www.w3.org/TR/sparql11-query/#defn_algGroup
> 
> If the pattern does not match, then Group(exprlist, Ω) is the empty set ∅.
> 
> ListEval(exprlist, μ) is the GROUP BY key
>  μ is a row ("solution mapping" in the spec).
> 
> ListEval(exprlist, μ) → { μ' ... } is the grouping of rows by GROUP BY. For each group, there is a set of rows. It's a map from key to set of rows.
> 
> If there are no matches, so no μ, there are no entries in this map. When there are no matches to the pattern "WHERE { ?s ?p ?o }", so no μ, there are no entries in this map; { μ' ... } is the empty set ∅.
> 
> From here, the rest of the definitions simplify down to empty.
> 
> Aggregation(exprlist, func, scalarvals, ∅) is ∅.
> AggregateJoin(∅) = ∅.
> Flatten(∅) = ∅.
> 
> COUNT(∅) = 0 (xsd:integer zero)
> SUM(∅) = 0
> AVG, MIN, MAX, SAMPLE are errors
> GROUP_CONCAT = ""
> 
> Throughout this, it does not matter if there is a GROUP BY clause or not because "Group" is the empty set ∅ in both cases.

there is a reading of the recommendation text according to which the two sets which are empty are not of the same kind.
in one case, the set is that over which the aggregation operations run and in that case, the result is 0.
in the other case it is the set of groups, in which case, there is no set of solutions over which to run the aggregation and therefore there is no result.

is there some way to read the text such that the empty set of groups transforms into an empty set of solutions?



---
james anderson | james@dydra.com | http://dydra.com