You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kevin Burton <bu...@spinn3r.com> on 2014/09/27 06:08:10 UTC
simple map / table scans without hadoop?
I have the requirements to periodically run full tables scans on our data.
It’s mostly for repair tasks or making bulk UPDATEs… but I’d prefer to do
it in Java because I need something mildly trivial.
Pig / hadoop / etc are mildly overkill for this. I don’t want or need a
whole hadoop or HDFS setup for this.
For example, a full table scan, and if a field matches a regex, set another
column based on that value.
Seems like this wouldn’t be too hard. Just write a daemon that looks at
the key distribution and runs a scan on the data closest to it. It would
be ideal if it was in a separate daemon so that you couldn’t accidentally
read all that data into memory and then OOM the Cassandra daemon.
Does this already exist?
--
Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
Re: simple map / table scans without hadoop?
Posted by Robert Coli <rc...@eventbrite.com>.
On Fri, Sep 26, 2014 at 9:08 PM, Kevin Burton <bu...@spinn3r.com> wrote:
> I have the requirements to periodically run full tables scans on our
> data. It’s mostly for repair tasks or making bulk UPDATEs… but I’d prefer
> to do it in Java because I need something mildly trivial.
>
http://wiki.apache.org/cassandra/FAQ#iter_world
?
=Rob