You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Li Yang <li...@apache.org> on 2015/12/11 07:52:19 UTC

read time coprocessor?

This is Yang from Apache Kylin project. We are thinking about using
Cassandra instead of HBase as storage. I searched and read around and still
have one question.

Does Cassandra support read time coprocessor that allows moving computation
to data node before scan result is returned? This shall reduce network
traffic greatly in our case.

Thank
Yang

Re: read time coprocessor?

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Dec 11, 2015 at 8:34 AM, DuyHai Doan <do...@gmail.com> wrote:

> The new UDF (User Defined Function) and UDA (User Defined Aggregate)
> introduced since Cassandra 2.2 is the feature to closest HBase co-processor.
>

Aren't "Prototype Triggers" (which probably no one should use) closer?

http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support

They still run on the coordinator, FWIW...

=Rob

Re: read time coprocessor?

Posted by DuyHai Doan <do...@gmail.com>.

The new UDF (User Defined Function) and UDA (User Defined Aggregate)
introduced since Cassandra 2.2 is the feature to closest HBase co-processor.

1. They are real time, in the sense that they are applied right away on the
fly after fetching data from C*
2. The computation is done on the coordinator, not on replica

The second point may be surprising. One might expect that UDF and UDA
computation is *distributed* among replicas but  because of the eventual
consistency model, data need to be retrieved and reconciled first on
coordinator node (last write win) before applying any UDF or UDA.

Now, if you're using consistency level ONE or LOCAL_ONE and a client with
TokenAware load balancing strategy, the coordinator node is indeed the
replica itself. In this particular configuration, UDF/UDA are applied
locally.

More info on UDF/UDA here:
http://www.slideshare.net/doanduyhai/cassandra-udf-and-materialized-views

On Fri, Dec 11, 2015 at 7:52 AM, Li Yang <li...@apache.org> wrote:

> This is Yang from Apache Kylin project. We are thinking about using
> Cassandra instead of HBase as storage. I searched and read around and still
> have one question.
>
> Does Cassandra support read time coprocessor that allows moving
> computation to data node before scan result is returned? This shall reduce
> network traffic greatly in our case.
>
> Thank
> Yang
>