You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Andy Seaborne <an...@apache.org> on 2017/11/04 11:18:40 UTC
SPARQL Aggregation : spec and implementation
Following on from the thread in October "Problem with MAX when no result
expected" <https://s.apache.org/03Hz> here is an investigation into
SPARQL Aggregation, both what the spec says and what ARQ does.
It would be interesting to get reports about other implementations for
these two queries:
Q1: SELECT (COUNT(*) AS ?C1) { ?s ?p ?o FILTER(false) }
Q2: SELECT (COUNT(*) AS ?C2) { ?s ?p ?o FILTER(false) } GROUP BY ?s
It doesn't matter what the data is - the WHERE {} does not match
anything. Or use Q1,Q2 below with an empty dataset.
== tl;dr
I think that ARQ follows the spec but the spec is not what you might
expect from an SQL background.
==
The original question rose from the case of no matches to the WHERE clause.
* What happens when there is no GROUP BY
* What happens when there is a GROUP BY
Q1: SELECT (COUNT(*) AS ?C1) { ?s ?p ?o }
Q2: SELECT (COUNT(*) AS ?C2) { ?s ?p ?o } GROUP BY ?s
with no matches to the graph pattern { ?s ?p ?o }.
(beware below! {} are being used in different ways!)
== The SPARQL 1.1 specification
In the definition of "Group",
https://www.w3.org/TR/sparql11-query/#defn_algGroup
If the pattern does not match, then Group(exprlist, Ω) is the empty set ∅.
ListEval(exprlist, μ) is the GROUP BY key
μ is a row ("solution mapping" in the spec).
ListEval(exprlist, μ) → { μ' ... } is the grouping of rows by GROUP BY.
For each group, there is a set of rows. It's a map from key to set of rows.
If there are no matches, so no μ, there are no entries in this map. When
there are no matches to the pattern "WHERE { ?s ?p ?o }", so no μ, there
are no entries in this map; { μ' ... } is the empty set ∅.
From here, the rest of the definitions simplify down to empty.
Aggregation(exprlist, func, scalarvals, ∅) is ∅.
AggregateJoin(∅) = ∅.
Flatten(∅) = ∅.
COUNT(∅) = 0 (xsd:integer zero)
SUM(∅) = 0
AVG, MIN, MAX, SAMPLE are errors
GROUP_CONCAT = ""
Throughout this, it does not matter if there is a GROUP BY clause or not
because "Group" is the empty set ∅ in both cases.
This may be surprising if you are used to SQL because in SQL with GROUP
BY you get no rows, and COUNT is never zero, whereas with no GROUP BY,
you get one row and a COUNT of zero.
== Results from ARQ
Q1: no matches:
------
| C1 |
======
| 0 |
------
Q2: no matches:
------
| C2 |
======
| 0 |
------
Re: SPARQL Aggregation : spec and implementation
Posted by james anderson <ja...@dydra.com>.
good evening,
> On 2017-11-04, at 17:34, Andy Seaborne <an...@apache.org> wrote:
>
>
>
> On 04/11/17 13:46, james anderson wrote:
>> good afternoon;
>>> On 2017-11-04, at 12:18, Andy Seaborne <an...@apache.org> wrote:
>>>
>>> Following on from the thread in October "Problem with MAX when no result expected" <https://s.apache.org/03Hz> here is an investigation into SPARQL Aggregation, both what the spec says and what ARQ does.
>>>
>>> It would be interesting to get reports about other implementations for these two queries:
>>>
>>> Q1: SELECT (COUNT(*) AS ?C1) { ?s ?p ?o FILTER(false) }
>>> Q2: SELECT (COUNT(*) AS ?C2) { ?s ?p ?o FILTER(false) } GROUP BY ?s
>
> For clarity:
>
>> http://dydra.com/jhacker/foaf/@query#q1 : 0
>
> Is that zero rows or one row with single columns with C1 being value 0?
>
>> http://dydra.com/jhacker/foaf/@query#q2 : no result
>
> Is that zero rows or execution error/illegal query?
the urls locate runnable queries.
if you hit the “run” button, the answers to those questions should appear.
>
>>> Throughout this, it does not matter if there is a GROUP BY clause or not because "Group" is the empty set ∅ in both cases.
>> there is a reading of the recommendation text according to which the two sets which are empty are not of the same kind.
>> in one case, the set is that over which the aggregation operations run and in that case, the result is 0.
>> in the other case it is the set of groups, in which case, there is no set of solutions over which to run the aggregation and therefore there is no result.
>
> Maybe, maybe not. Please quote the text that gives the alternative reading. I don't see reading that mentions kinds.
>
> I missed the fact that when there is no GROUP BY the group key is some constant like 1.
>
> But that does not change anything because Group() is defined in a way that uses the rows matching WHERE{} in a foreach and foreach of Ω = empty set is the empty set.
>
> Group(exprlist, Ω) = { ListEval(exprlist, μ) -> ... | μ in Ω }
>
> so if there are no μ, ListEval is not evaluated and Group() is the empty set. There is only one empty set. It does not carry any indication of use of GROUP BY or not.
yes, but it is not a set of solutions, it is a set of keys which are to be used to partition solutions.
>
> By the end of Group() the situation is the same
as what?
> so the result will be the same for what ever reading of the rest of the process (there are some parts I'm not completely confident with yet).
as is evident from my comment above, i do not read the document in a way which agrees with this claim.
>
> Specifically, for no GROUP BY the result is not a map of the constant to the empty multiset {1 -> {}}
agreed, it is a set of solutions.
which may be empty, bit is not the same this as an empty set of mappings from keys to sets of solutions.
>
>> is there some way to read the text such that the empty set of groups transforms into an empty set of solutions?
>
> Not that I can see; it should be covered by {1 -> {}} vs {} butthat distinction is lost at Group().
i read the passage which describes aggregation under 18.5
Aggregation(exprlist, func, scalarvals, { key1→Ω1, ..., keym→Ωm } ) = { (key, F(Ω)) | key → Ω in { key1→Ω1, ..., keym→Ωm } }
to reduce, in the case where no key is generated, to
Aggregation(exprlist, func, scalarvals, { } )
how else is that to be read?
best regards, from berlin,
---
james anderson | james@dydra.com | http://dydra.com
Re: SPARQL Aggregation : spec and implementation
Posted by Andy Seaborne <an...@apache.org>.
On 04/11/17 13:46, james anderson wrote:
> good afternoon;
>
>> On 2017-11-04, at 12:18, Andy Seaborne <an...@apache.org> wrote:
>>
>> Following on from the thread in October "Problem with MAX when no result expected" <https://s.apache.org/03Hz> here is an investigation into SPARQL Aggregation, both what the spec says and what ARQ does.
>>
>> It would be interesting to get reports about other implementations for these two queries:
>>
>> Q1: SELECT (COUNT(*) AS ?C1) { ?s ?p ?o FILTER(false) }
>> Q2: SELECT (COUNT(*) AS ?C2) { ?s ?p ?o FILTER(false) } GROUP BY ?s
>
For clarity:
> http://dydra.com/jhacker/foaf/@query#q1 : 0
Is that zero rows or one row with single columns with C1 being value 0?
> http://dydra.com/jhacker/foaf/@query#q2 : no result
Is that zero rows or execution error/illegal query?
>> Throughout this, it does not matter if there is a GROUP BY clause or not because "Group" is the empty set ∅ in both cases.
>
> there is a reading of the recommendation text according to which the two sets which are empty are not of the same kind.
> in one case, the set is that over which the aggregation operations run and in that case, the result is 0.
> in the other case it is the set of groups, in which case, there is no set of solutions over which to run the aggregation and therefore there is no result.
Maybe, maybe not. Please quote the text that gives the alternative
reading. I don't see reading that mentions kinds.
I missed the fact that when there is no GROUP BY the group key is some
constant like 1.
But that does not change anything because Group() is defined in a way
that uses the rows matching WHERE{} in a foreach and foreach of Ω =
empty set is the empty set.
Group(exprlist, Ω) = { ListEval(exprlist, μ) -> ... | μ in Ω }
so if there are no μ, ListEval is not evaluated and Group() is the empty
set. There is only one empty set. It does not carry any indication of
use of GROUP BY or not.
By the end of Group() the situation is the same so the result will be
the same for what ever reading of the rest of the process (there are
some parts I'm not completely confident with yet).
Specifically, for no GROUP BY the result is not a map of the constant to
the empty multiset {1 -> {}}
> is there some way to read the text such that the empty set of groups transforms into an empty set of solutions?
Not that I can see; it should be covered by {1 -> {}} vs {} butthat
distinction is lost at Group().
Andy
>
>
>
> ---
> james anderson | james@dydra.com | http://dydra.com
>
>
>
>
>
Re: SPARQL Aggregation : spec and implementation
Posted by james anderson <ja...@dydra.com>.
good afternoon;
> On 2017-11-04, at 12:18, Andy Seaborne <an...@apache.org> wrote:
>
> Following on from the thread in October "Problem with MAX when no result expected" <https://s.apache.org/03Hz> here is an investigation into SPARQL Aggregation, both what the spec says and what ARQ does.
>
> It would be interesting to get reports about other implementations for these two queries:
>
> Q1: SELECT (COUNT(*) AS ?C1) { ?s ?p ?o FILTER(false) }
> Q2: SELECT (COUNT(*) AS ?C2) { ?s ?p ?o FILTER(false) } GROUP BY ?s
http://dydra.com/jhacker/foaf/@query#q1 : 0
http://dydra.com/jhacker/foaf/@query#q2 : no result
>
> It doesn't matter what the data is - the WHERE {} does not match anything. Or use Q1,Q2 below with an empty dataset.
>
> == tl;dr
>
> I think that ARQ follows the spec but the spec is not what you might expect from an SQL background.
i have no background implementing sql.
>
> ==
>
> The original question rose from the case of no matches to the WHERE clause.
>
> * What happens when there is no GROUP BY
> * What happens when there is a GROUP BY
>
> Q1: SELECT (COUNT(*) AS ?C1) { ?s ?p ?o }
> Q2: SELECT (COUNT(*) AS ?C2) { ?s ?p ?o } GROUP BY ?s
>
> with no matches to the graph pattern { ?s ?p ?o }.
> (beware below! {} are being used in different ways!)
>
> == The SPARQL 1.1 specification
>
> In the definition of "Group",
> https://www.w3.org/TR/sparql11-query/#defn_algGroup
>
> If the pattern does not match, then Group(exprlist, Ω) is the empty set ∅.
>
> ListEval(exprlist, μ) is the GROUP BY key
> μ is a row ("solution mapping" in the spec).
>
> ListEval(exprlist, μ) → { μ' ... } is the grouping of rows by GROUP BY. For each group, there is a set of rows. It's a map from key to set of rows.
>
> If there are no matches, so no μ, there are no entries in this map. When there are no matches to the pattern "WHERE { ?s ?p ?o }", so no μ, there are no entries in this map; { μ' ... } is the empty set ∅.
>
> From here, the rest of the definitions simplify down to empty.
>
> Aggregation(exprlist, func, scalarvals, ∅) is ∅.
> AggregateJoin(∅) = ∅.
> Flatten(∅) = ∅.
>
> COUNT(∅) = 0 (xsd:integer zero)
> SUM(∅) = 0
> AVG, MIN, MAX, SAMPLE are errors
> GROUP_CONCAT = ""
>
> Throughout this, it does not matter if there is a GROUP BY clause or not because "Group" is the empty set ∅ in both cases.
there is a reading of the recommendation text according to which the two sets which are empty are not of the same kind.
in one case, the set is that over which the aggregation operations run and in that case, the result is 0.
in the other case it is the set of groups, in which case, there is no set of solutions over which to run the aggregation and therefore there is no result.
is there some way to read the text such that the empty set of groups transforms into an empty set of solutions?
---
james anderson | james@dydra.com | http://dydra.com