You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Francisco Reyes <li...@natserv.net> on 2016/01/28 18:06:09 UTC

Are aggregate functions done in parallel?

Does Cassandra paralelizes aggregate functions?

Have a new project with potentially 200 to 300 million rows per month 
that I need to do aggregates on. Wondering if Cassandra would be a good 
match.

Re: Are aggregate functions done in parallel?

Posted by DuyHai Doan <do...@gmail.com>.

You can read this: http://www.doanduyhai.com/blog/?p=1876 and this:
http://www.doanduyhai.com/blog/?p=2015

Long story short, UDF and UDA computation is Cassandra is not distributed.
All the values are retrieved first on the coordinator node (to apply the
last write win reconciliation logic) before applying any UDF/UDA

The sweet spot for Cassandra UDA is single partition operations. If you
need to aggregate on multiple partitions, consider using Apache Spark

On Thu, Jan 28, 2016 at 6:06 PM, Francisco Reyes <li...@natserv.net> wrote:

> Does Cassandra paralelizes aggregate functions?
>
> Have a new project with potentially 200 to 300 million rows per month that
> I need to do aggregates on. Wondering if Cassandra would be a good match.
>