You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jena.apache.org by Emanuele Della Valle <em...@polimi.it> on 2013/12/03 16:10:50 UTC

how to implement a new aggregation in ARQ

Dear all,

Marco and I would like to implement a new aggregation in ARQ. 

We understand that a textual query is translated in expressions and, then, into an algebraic execution plan. We found the aggregation expressions [1], but we cannot find the operators that implement them in [2]
Can you help us?

Bests,

Emanuele 

[1] http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/sparql/expr/aggregate/package-frame.html
[2] http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/sparql/algebra/op/package-frame.html

--
prof. Emanuele Della Valle
DEIB - Politecnico di Milano
m. +393389375810
w. http://emanueledellavalle.org


Re: how to implement a new aggregation in ARQ

Posted by Andy Seaborne <an...@apache.org>.
On 10/12/13 11:04, Emanuele Della Valle wrote:
> Hi Andy and all,
>
> thank you very much for the clear answer. We did not figure out that
> the aggregation functions are arguments to the group operator. As you
> pointed out, it was there under our eyes, but we were not seeing it
> :-)

:-)

>
> It's a pity that ARQ does not have a registry of aggregates. I
> understand your proposal to tweak the parser process and to add an
> AggregationRegistry. Still, this will require us to brach ARQ, isn't
> it?

Yes.

Do you want new keywords for new aggregates or the possibility of 
URI-function syntax for aggregates?  or both?

> We are happy to contribute to the ARQ code new aggregates,

Great!

> but we
> probably need some help in designing and implementing the
> AggregationRegistry, can you help?

(in Fuseki)
org.apache.jena.fuseki.migrate.Registry<T>

which should really roll back to FunctionRegistry , 
PropertyFunctionRegistry but it's hardly a priority as the latter work 
albeit predating Registry<T>.

	Andy

>
> Cheers,
>
> Emanuele

Which aggregates do you want to add BTW?

Re: how to implement a new aggregation in ARQ

Posted by Emanuele Della Valle <em...@polimi.it>.
Hi Andy and all,

thank you very much for the clear answer. We did not figure out that the aggregation functions are arguments to the group operator. As you pointed out, it was there under our eyes, but we were not seeing it :-)

It's a pity that ARQ does not have a registry of aggregates. I understand your proposal to tweak the parser process and to add an AggregationRegistry. Still, this will require us to brach ARQ, isn't it? 

We are happy to contribute to the ARQ code new aggregates, but we probably need some help in designing and implementing the AggregationRegistry, can you help?

Cheers,

Emanuele

On Dec 5, 2013, at 12:02 PM, Andy Seaborne <an...@apache.org> wrote:

> On 03/12/13 15:10, Emanuele Della Valle wrote:
>> Dear all,
>> 
>> Marco and I would like to implement a new aggregation in ARQ.
>> 
>> We understand that a textual query is translated in expressions and,
>> then, into an algebraic execution plan. We found the aggregation
>> expressions [1], but we cannot find the operators that implement them
>> in [2] Can you help us?
> 
> Aggregates are only used in (group) so they appear as arguments the (group) operator. They aren't top-level operators per se because they only have meaning when part of the grouping process that feeds them their inputs.
> 
> arq.qparse can help here: it will print the algebra:
> 
> example:
> 
> ~/tmp >> qparse --print=query --print=op --file Q.rq
> SELECT  (count(*) AS ?C)
> WHERE
>  { ?s ?p ?o }
> GROUP BY ?s
> HAVING ( count(*) > 5 )
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> (project (?C)
>  (filter (> ?.0 5)
>    (extend ((?C ?.0))
>      (group (?s) ((?.0 (count)))
>        (bgp (triple ?s ?p ?o))))))
> 
> or when then there is no GROUP BY clause:
> 
> SELECT  (count(*) AS ?C)
> WHERE
>  { ?s ?p ?o }
> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
> (project (?C)
>  (extend ((?C ?.0))
>    (group () ((?.0 (count)))
>      (bgp (triple ?s ?p ?o)))))
> 
> 
> and the online version (also included in Fuseki)
> http://www.sparql.org/query-validator.html
> 
> Aggregates can only appear in certain places for meaning - SELECT clause, HAVING, ORDER BY (the latter being obscure).
> 
> In SPARQL, custom aggregate named by URI is not called out in the syntax.  They look like functions - except that they allow the word DISTINCT in the arguments.
> 
> What ARQ does not have is a registry of aggregates - the only ones it supports is limited by the parser.  The execution engine doesn't have a fixed set.  You'll need to tweak the parser process; one way is to look any function URI up in a new AggregationRegistry to see if it is a function or an aggregate and proceed acordingly.
> 
> Example:
> 
> SELECT (my:something(?x) AS ?X) { ... }
> 
> you can't tell by syntax if that's an aggregate or a plain custom extension function [*].
> 
> There are other aggregates that could be usefully added to the general distribution - more stats ones being obvious (to me!).
> 
> 	Andy
> 
> [*]
> Personally, I think that aggregates and functions should be separate syntax e.g. AGG(uri, args, ...) or AGG(uri(args, ...)) and if that works better for you, we can add it to the extended language.
> 
>> 
>> Bests,
>> 
>> Emanuele
>> 
>> [1]
>> 
> http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/sparql/expr/aggregate/package-frame.html
>> 
>> 
> [2] http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/sparql/algebra/op/package-frame.html
>> 
>> -- prof. Emanuele Della Valle DEIB - Politecnico di Milano m.
>> +393389375810 w. http://emanueledellavalle.org
>> 
> 


Re: how to implement a new aggregation in ARQ

Posted by Andy Seaborne <an...@apache.org>.
On 03/12/13 15:10, Emanuele Della Valle wrote:
> Dear all,
>
> Marco and I would like to implement a new aggregation in ARQ.
>
> We understand that a textual query is translated in expressions and,
> then, into an algebraic execution plan. We found the aggregation
> expressions [1], but we cannot find the operators that implement them
> in [2] Can you help us?

Aggregates are only used in (group) so they appear as arguments the 
(group) operator. They aren't top-level operators per se because they 
only have meaning when part of the grouping process that feeds them 
their inputs.

arq.qparse can help here: it will print the algebra:

example:

~/tmp >> qparse --print=query --print=op --file Q.rq
SELECT  (count(*) AS ?C)
WHERE
   { ?s ?p ?o }
GROUP BY ?s
HAVING ( count(*) > 5 )
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(project (?C)
   (filter (> ?.0 5)
     (extend ((?C ?.0))
       (group (?s) ((?.0 (count)))
         (bgp (triple ?s ?p ?o))))))

or when then there is no GROUP BY clause:

SELECT  (count(*) AS ?C)
WHERE
   { ?s ?p ?o }
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
(project (?C)
   (extend ((?C ?.0))
     (group () ((?.0 (count)))
       (bgp (triple ?s ?p ?o)))))


and the online version (also included in Fuseki)
http://www.sparql.org/query-validator.html

Aggregates can only appear in certain places for meaning - SELECT 
clause, HAVING, ORDER BY (the latter being obscure).

In SPARQL, custom aggregate named by URI is not called out in the 
syntax.  They look like functions - except that they allow the word 
DISTINCT in the arguments.

What ARQ does not have is a registry of aggregates - the only ones it 
supports is limited by the parser.  The execution engine doesn't have a 
fixed set.  You'll need to tweak the parser process; one way is to look 
any function URI up in a new AggregationRegistry to see if it is a 
function or an aggregate and proceed acordingly.

Example:

SELECT (my:something(?x) AS ?X) { ... }

you can't tell by syntax if that's an aggregate or a plain custom 
extension function [*].

There are other aggregates that could be usefully added to the general 
distribution - more stats ones being obvious (to me!).

	Andy

[*]
Personally, I think that aggregates and functions should be separate 
syntax e.g. AGG(uri, args, ...) or AGG(uri(args, ...)) and if that works 
better for you, we can add it to the extended language.

>
> Bests,
>
> Emanuele
>
> [1]
>
http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/sparql/expr/aggregate/package-frame.html
>
>
[2] 
http://jena.apache.org/documentation/javadoc/arq/com/hp/hpl/jena/sparql/algebra/op/package-frame.html
>
> -- prof. Emanuele Della Valle DEIB - Politecnico di Milano m.
> +393389375810 w. http://emanueledellavalle.org
>