You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Brian O'Neill <bo...@alumni.brown.edu> on 2011/11/01 22:02:48 UTC

R on Cassandra

I saw a mention of R on Cassandra:
http://comments.gmane.org/gmane.comp.db.cassandra.user/5681

Does anyone know if this has traction somewhere?

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/

Re: R on Cassandra

Posted by Paul Brown <pa...@gmail.com>.
Hi, Brian --

A little late to reply, but I'm slowly catching up.

You're going to be better off, IMHO, to pull the data out of Cassandra with a tool like Pig (probably with a bit of aggregation and filtering) and then operate on it in R as a static delimited file.  If you need additional automation or batching (as well as cleaning and aggregation), you can automate that using various tools.  Some of this depends on your modeling workflow, but it's not unreasonable to expect that you'll want to return to exactly the same dataset and repeat some processes as you refine your approach.  It's difficult/impossible to do that against live data.

-- Paul

On Nov 1, 2011, at 2:02 PM, Brian O'Neill wrote:

> I saw a mention of R on Cassandra:
> http://comments.gmane.org/gmane.comp.db.cassandra.user/5681
> 
> Does anyone know if this has traction somewhere?
> 
> -brian
> 
> -- 
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024
> blog: http://weblogs.java.net/blog/boneill42/
> blog: http://brianoneill.blogspot.com/
>