You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by William Katsak <wk...@cs.rutgers.edu> on 2013/02/04 21:11:50 UTC
Question about Streaming
Hello,
I am working on some modifications of Cassandra in an academic setting
(research code, not for a class), and have a question regarding bulk
streaming of data across the network (e.g. between nodes).
Assume that I have some known set of key/column family combos that are
known good/current on a node A, and known stale on a node B (forget
about hinted handoff, etc. assume that this mechanism isn't being used).
I can obviously bring these up to date on B by using anti-entropy
repair, but this checks all the data and is CPU/time intensive. I have
written code that brings this data up to date using the same mechanism
as read repair (e.g. an item at a time), and this works fine, but is
inefficient.
What I am interested in doing is something in between. I want to bulk
stream a series of updates between nodes like anti-entropy does, but I
want the data that is sent to only be part of the specific itemized set
that I am interested in.
Is this something that is possible to do with the current code that
exists, assuming that I already have code that keeps track of this set
of stale data?
Advice is much appreciated.
Sincerely,
Bill Katsak
Rutgers University
Re: Question about Streaming
Posted by Brandon Williams <dr...@gmail.com>.
On Mon, Feb 4, 2013 at 2:11 PM, William Katsak <wk...@cs.rutgers.edu> wrote:
> Hello,
>
> I am working on some modifications of Cassandra in an academic setting
> (research code, not for a class), and have a question regarding bulk
> streaming of data across the network (e.g. between nodes).
>
> Assume that I have some known set of key/column family combos that are known
> good/current on a node A, and known stale on a node B (forget about hinted
> handoff, etc. assume that this mechanism isn't being used). I can obviously
> bring these up to date on B by using anti-entropy repair, but this checks
> all the data and is CPU/time intensive. I have written code that brings this
> data up to date using the same mechanism as read repair (e.g. an item at a
> time), and this works fine, but is inefficient.
>
> What I am interested in doing is something in between. I want to bulk stream
> a series of updates between nodes like anti-entropy does, but I want the
> data that is sent to only be part of the specific itemized set that I am
> interested in.
If all you want is a ks/cf-specific version of 'nodetool rebuild' then
that is a good place to start.
-Brandon