You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2015/02/18 17:29:11 UTC

[jira] [Commented] (CASSANDRA-8826) Distributed aggregates

    [ https://issues.apache.org/jira/browse/CASSANDRA-8826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326143#comment-14326143 ] 

Sylvain Lebresne commented on CASSANDRA-8826:
---------------------------------------------

I'll note that Cassandra has no ambition of tackling analytic queries itself. There is wonderful framework (Hadoop, Spark) that do that better that we probably can. Existing aggregation are 1) when you want to aggregate over a (small portion) of a single partition (basically for the case where today you'd just query and aggregate client side; in that case, btw, if you use CL.ONE and token-aware client, distributing the aggregate would buy you nothing) and 2) as convenience during development.

I'm not saying there is no way to implement distributed aggregates, but we know it's not trivial either (due to consistency issues in particular) and hence it's imo not worth the complexity of re-inventing a poor-man Spark when Spark (or other) exists and is actively developed. Overall, I feel this is out of scope for Cassandra.

> Distributed aggregates
> ----------------------
>
>                 Key: CASSANDRA-8826
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8826
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Robert Stupp
>            Priority: Minor
>
> Aggregations have been implemented in CASSANDRA-4914.
> All calculation is performed on the coordinator. This means, that all data is pulled by the coordinator and processed there.
> This ticket's about to distribute aggregates to make them more efficient. Currently some related tickets (esp. CASSANDRA-8099) are currently in progress - we should wait for them to land before talking about implementation.
> Another playgrounds (not covered by this ticket), that might be related is about _distributed filtering_.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)