You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Kevin Burton <bu...@spinn3r.com> on 2014/09/27 06:08:10 UTC

simple map / table scans without hadoop?

I have the requirements to periodically run full tables scans on our data.
It’s mostly for repair tasks or making bulk UPDATEs… but I’d prefer to do
it in Java because I need something mildly trivial.

Pig / hadoop / etc are mildly overkill for this.  I don’t want or need a
whole hadoop or HDFS setup for this.

For example, a full table scan, and if a field matches a regex, set another
column based on that value.

Seems like this wouldn’t be too hard.  Just write a daemon that looks at
the key distribution and runs a scan on the data closest to it.  It would
be ideal if it was in a separate daemon so that you couldn’t accidentally
read all that data into memory and then OOM the Cassandra daemon.

Does this already exist?

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: simple map / table scans without hadoop?

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Sep 26, 2014 at 9:08 PM, Kevin Burton <bu...@spinn3r.com> wrote:

> I have the requirements to periodically run full tables scans on our
> data.  It’s mostly for repair tasks or making bulk UPDATEs… but I’d prefer
> to do it in Java because I need something mildly trivial.
>

http://wiki.apache.org/cassandra/FAQ#iter_world

?

=Rob