You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by David Semeria <da...@lmframework.com> on 2014/02/27 14:50:32 UTC
Node side processing
Hi List,
I was wondering whether there have been any past proposals for
implementing node side processing (NSP) in C*. By NSP, I mean the
passing a reference to a Java class which would then process the result
set before it being returned to the client.
In our particular use case our clients typically loop through result
sets of a million or more rows to produce a tiny amount of output (sums,
means, variance, etc). The bottleneck -- quite obviously -- is the need
to transfer a million rows to the client before processing can take
place. It would be extremely useful to execute this processing on the
coordinator node and only transfer the results to the client.
I mention this here because I can imagine other C* users having similar
requirements.
Thanks
D.
Re: Node side processing
Posted by Edward Capriolo <ed...@gmail.com>.
Check intravert on github. I am working t get many of those features into
cassandra.
On Thursday, February 27, 2014, Brandon Williams <dr...@gmail.com> wrote:
> A few:
>
> https://issues.apache.org/jira/browse/CASSANDRA-4914
>
> https://issues.apache.org/jira/browse/CASSANDRA-5184
>
> https://issues.apache.org/jira/browse/CASSANDRA-6704
>
> https://issues.apache.org/jira/browse/CASSANDRA-6167
>
>
>
> On Thu, Feb 27, 2014 at 7:50 AM, David Semeria <david@lmframework.com
>wrote:
>
>> Hi List,
>>
>> I was wondering whether there have been any past proposals for
>> implementing node side processing (NSP) in C*. By NSP, I mean the
passing a
>> reference to a Java class which would then process the result set before
it
>> being returned to the client.
>>
>> In our particular use case our clients typically loop through result sets
>> of a million or more rows to produce a tiny amount of output (sums,
means,
>> variance, etc). The bottleneck -- quite obviously -- is the need to
>> transfer a million rows to the client before processing can take place.
It
>> would be extremely useful to execute this processing on the coordinator
>> node and only transfer the results to the client.
>>
>> I mention this here because I can imagine other C* users having similar
>> requirements.
>>
>> Thanks
>>
>> D.
>>
>
--
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.
Re: Node side processing
Posted by Brandon Williams <dr...@gmail.com>.
A few:
https://issues.apache.org/jira/browse/CASSANDRA-4914
https://issues.apache.org/jira/browse/CASSANDRA-5184
https://issues.apache.org/jira/browse/CASSANDRA-6704
https://issues.apache.org/jira/browse/CASSANDRA-6167
On Thu, Feb 27, 2014 at 7:50 AM, David Semeria <da...@lmframework.com>wrote:
> Hi List,
>
> I was wondering whether there have been any past proposals for
> implementing node side processing (NSP) in C*. By NSP, I mean the passing a
> reference to a Java class which would then process the result set before it
> being returned to the client.
>
> In our particular use case our clients typically loop through result sets
> of a million or more rows to produce a tiny amount of output (sums, means,
> variance, etc). The bottleneck -- quite obviously -- is the need to
> transfer a million rows to the client before processing can take place. It
> would be extremely useful to execute this processing on the coordinator
> node and only transfer the results to the client.
>
> I mention this here because I can imagine other C* users having similar
> requirements.
>
> Thanks
>
> D.
>
Re: Node side processing
Posted by Tupshin Harper <tu...@tupshin.com>.
Hi David,
Check out the ongoing discussion in
https://issues.apache.org/jira/browse/CASSANDRA-6704 as well as some
related tickets linked to from that one.
No consensus at this point, but I'm personally hoping to see something
along the general lines of Hive's UDFs.
-Tupshin
On Thu, Feb 27, 2014 at 8:50 AM, David Semeria <da...@lmframework.com>wrote:
> Hi List,
>
> I was wondering whether there have been any past proposals for
> implementing node side processing (NSP) in C*. By NSP, I mean the passing a
> reference to a Java class which would then process the result set before it
> being returned to the client.
>
> In our particular use case our clients typically loop through result sets
> of a million or more rows to produce a tiny amount of output (sums, means,
> variance, etc). The bottleneck -- quite obviously -- is the need to
> transfer a million rows to the client before processing can take place. It
> would be extremely useful to execute this processing on the coordinator
> node and only transfer the results to the client.
>
> I mention this here because I can imagine other C* users having similar
> requirements.
>
> Thanks
>
> D.
>