You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by David Semeria <da...@lmframework.com> on 2014/02/27 14:50:32 UTC

Node side processing

Hi List,

I was wondering whether there have been any past proposals for 
implementing node side processing (NSP) in C*. By NSP, I mean the 
passing a reference to a Java class which would then process the result 
set before it being returned to the client.

In our particular use case our clients typically loop through result 
sets of a million or more rows to produce a tiny amount of output (sums, 
means, variance, etc). The bottleneck -- quite obviously -- is the need 
to transfer a million rows to the client before processing can take 
place. It would be extremely useful to execute this processing on the 
coordinator node and only transfer the results to the client.

I mention this here because I can imagine other C* users having similar 
requirements.

Thanks

D.

Re: Node side processing

Posted by Edward Capriolo <ed...@gmail.com>.

Check intravert on github. I am working t get many of those features into
cassandra.

On Thursday, February 27, 2014, Brandon Williams <dr...@gmail.com> wrote:
> A few:
>
> https://issues.apache.org/jira/browse/CASSANDRA-4914
>
> https://issues.apache.org/jira/browse/CASSANDRA-5184
>
> https://issues.apache.org/jira/browse/CASSANDRA-6704
>
> https://issues.apache.org/jira/browse/CASSANDRA-6167
>
>
>
> On Thu, Feb 27, 2014 at 7:50 AM, David Semeria <david@lmframework.com
>wrote:
>
>> Hi List,
>>
>> I was wondering whether there have been any past proposals for
>> implementing node side processing (NSP) in C*. By NSP, I mean the
passing a
>> reference to a Java class which would then process the result set before
it
>> being returned to the client.
>>
>> In our particular use case our clients typically loop through result sets
>> of a million or more rows to produce a tiny amount of output (sums,
means,
>> variance, etc). The bottleneck -- quite obviously -- is the need to
>> transfer a million rows to the client before processing can take place.
It
>> would be extremely useful to execute this processing on the coordinator
>> node and only transfer the results to the client.
>>
>> I mention this here because I can imagine other C* users having similar
>> requirements.
>>
>> Thanks
>>
>> D.
>>
>

-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.

Re: Node side processing

Posted by Brandon Williams <dr...@gmail.com>.

A few:

https://issues.apache.org/jira/browse/CASSANDRA-4914

https://issues.apache.org/jira/browse/CASSANDRA-5184

https://issues.apache.org/jira/browse/CASSANDRA-6704

https://issues.apache.org/jira/browse/CASSANDRA-6167



On Thu, Feb 27, 2014 at 7:50 AM, David Semeria <da...@lmframework.com>wrote:

> Hi List,
>
> I was wondering whether there have been any past proposals for
> implementing node side processing (NSP) in C*. By NSP, I mean the passing a
> reference to a Java class which would then process the result set before it
> being returned to the client.
>
> In our particular use case our clients typically loop through result sets
> of a million or more rows to produce a tiny amount of output (sums, means,
> variance, etc). The bottleneck -- quite obviously -- is the need to
> transfer a million rows to the client before processing can take place. It
> would be extremely useful to execute this processing on the coordinator
> node and only transfer the results to the client.
>
> I mention this here because I can imagine other C* users having similar
> requirements.
>
> Thanks
>
> D.
>

Re: Node side processing

Posted by Tupshin Harper <tu...@tupshin.com>.

Hi David,

Check out the ongoing discussion in
https://issues.apache.org/jira/browse/CASSANDRA-6704 as well as some
related tickets linked to from that one.

No consensus at this point, but I'm personally hoping to see something
along the general lines of Hive's UDFs.

-Tupshin


On Thu, Feb 27, 2014 at 8:50 AM, David Semeria <da...@lmframework.com>wrote:

> Hi List,
>
> I was wondering whether there have been any past proposals for
> implementing node side processing (NSP) in C*. By NSP, I mean the passing a
> reference to a Java class which would then process the result set before it
> being returned to the client.
>
> In our particular use case our clients typically loop through result sets
> of a million or more rows to produce a tiny amount of output (sums, means,
> variance, etc). The bottleneck -- quite obviously -- is the need to
> transfer a million rows to the client before processing can take place. It
> would be extremely useful to execute this processing on the coordinator
> node and only transfer the results to the client.
>
> I mention this here because I can imagine other C* users having similar
> requirements.
>
> Thanks
>
> D.
>