You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2012/11/09 10:58:14 UTC

[jira] [Commented] (CASSANDRA-4914) Aggregate functions in CQL

    [ https://issues.apache.org/jira/browse/CASSANDRA-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13493857#comment-13493857 ] 

Sylvain Lebresne commented on CASSANDRA-4914:
---------------------------------------------

I'm not necessarily opposed to the idea on principle, but unless we have some fancy idea, this will only save networks between the client and the coordinator, as internally we'll still have to pull all the data (though that's not very different from what we do for count today). Meaning that if we do that, we should be clear about that fact and that people should still go the hadoop route to do large aggregations.

I'm also halfway convinced that it wouldn't be much harder to support custom "filter" functions. I.e. to allow people to define some class having a method along the line of:
{noformat}
public ResultSet filter(ResultSet rs);
{noformat}
and so that it might be worth to go that more general route right away and just provide a number of default aggregation functions.

I'm also not sure it's wise to support this until we can properly page CQL queries (i.e. I think this should depends on CASSANDRA-4415). Also, I think it would be weird to introduce aggregation before we remove our current select arbitrary limit (though I'm in favor of doing that sooner than later: CASSANDRA-4918).

Lastly, aggregation might lose a bit of it's usefulness without a proper support for DISTINCT. So overall my opinion would be: if we do do that, let's push that to 1.3 and do that correctly.
                
> Aggregate functions in CQL
> --------------------------
>
>                 Key: CASSANDRA-4914
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4914
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Vijay
>            Assignee: Vijay
>             Fix For: 1.2.1
>
>
> The requirement is to do aggregation of data in Cassandra (Wide row of column values of int, double, float etc).
> With some basic agree gate functions like AVG, SUM, Mean, Min, Max, etc (for the columns within a row).
> Example:
> SELECT * FROM emp WHERE empID IN (130) ORDER BY deptID DESC;                                    
>  empid | deptid | first_name | last_name | salary
> -------+--------+------------+-----------+--------
>    130 |      3 |     joe    |     doe   |   10.1
>    130 |      2 |     joe    |     doe   |    100
>    130 |      1 |     joe    |     doe   |  1e+03
>  
> SELECT sum(salary), empid FROM emp WHERE empID IN (130);                                    
>  sum(salary) | empid
> -------------+--------
>    1110.1    |  130

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira