You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ran Tavory <ra...@gmail.com> on 2010/05/31 14:23:23 UTC

nodetool cleanup isn't cleaning up?

I hope I understand nodetool cleanup correctly - it should clean up all data
that does not (currently) belong to this node. If so, I think it might not
be working correctly.

Look at nodes 192.168.252.124 and 192.168.252.99 below

192.168.252.99Up         279.35 MB     3544607988759775661076818827414252202
     |<--|
192.168.252.124Up         167.23 MB
56713727820156410577229101238628035242     |   ^
192.168.252.125Up         82.91 MB
 85070591730234615865843651857942052863     v   |
192.168.254.57Up         366.6 MB
 113427455640312821154458202477256070485    |   ^
192.168.254.58Up         88.44 MB
 141784319550391026443072753096570088106    v   |
192.168.254.59Up         88.45 MB
 170141183460469231731687303715884105727    |-->|

I wanted 124 to take all the load from 99. So I issued a move command.

$ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243


This command tells 99 to take the space b/w
(56713727820156410577229101238628035242,
56713727820156410577229101238628035243]
which is basically just one item in the token space, almost nothing... I
wanted it to be very slim (just playing around).

So, next I get this:

192.168.252.124Up         803.33 MB
56713727820156410577229101238628035242     |<--|
192.168.252.99Up         352.85 MB
56713727820156410577229101238628035243     |   ^
192.168.252.125Up         134.24 MB
85070591730234615865843651857942052863     v   |
192.168.254.57Up         676.41 MB
113427455640312821154458202477256070485    |   ^
192.168.254.58Up         99.74 MB
 141784319550391026443072753096570088106    v   |
192.168.254.59Up         99.94 MB
 170141183460469231731687303715884105727    |-->|

The tokens are correct, but it seems that 99 still has a lot of data. Why?
OK, that might be b/c it didn't delete its moved data.
So next I issued a nodetool cleanup, which should have taken care of that.
Only that it didn't, the node 99 still has 352 MB of data. Why?
So, you know what, I waited for 1h. Still no good, data wasn't cleaned up.
I restarted the server. Still, data wasn't cleaned up... I issued a cleanup
again... still no good... what's up with this node?

Re: nodetool cleanup isn't cleaning up?

Posted by Ran Tavory <ra...@gmail.com>.

getRangeToEndpointMap is very useful, thanks, I didn't know about it...
however, I've reconfigured my cluster since (moved some nodes and tokens) so
not the problem is gone. I guess I'll use getRangeToEndpointMap next time I
see something like this...

On Thu, Jun 3, 2010 at 9:15 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> Then the next step is to check StorageService.getRangeToEndpointMap via jmx
>
> On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory <ra...@gmail.com> wrote:
> > I'm using RackAwareStrategy. But it still doesn't make sense I think...
> > let's see what did I miss...
> > According to http://wiki.apache.org/cassandra/Operations
> >
> > RackAwareStrategy: replica 2 is placed in the first node along the ring
> the
> > belongs in another data center than the first; the remaining N-2
> replicas,
> > if any, are placed on the first nodes along the ring in the same rack as
> the
> > first
> >
> > 192.168.252.124Up        803.33 MB
> > 56713727820156410577229101238628035242     |<--|
> > 192.168.252.99Up         352.85 MB
> > 56713727820156410577229101238628035243     |   ^
> > 192.168.252.125Up        134.24 MB
> > 85070591730234615865843651857942052863     v   |
> > 192.168.254.57Up         676.41 MB
> >  113427455640312821154458202477256070485    |   ^
> > 192.168.254.58Up          99.74 MB
> >  141784319550391026443072753096570088106    v   |
> > 192.168.254.59Up          99.94 MB
> >  170141183460469231731687303715884105727    |-->|
> > Alright, so I made a mistake and didn't use the alternate-datacenter
> > suggestion on the page so the first node of every DC is overloaded with
> > replicas. However,  the current situation still doesn't make sense to me.
> > .252.124 will be overloaded b/c it has the first token in the 252 dc.
> > .254.57 will also be overloaded since it has the first token in the .254
> DC.
> > But for which node does 252.99 serve as a replicator? It's not the first
> in
> > the DC and it's just one single token more than it's predecessor (which
> is
> > in the same DC).
> > On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> I'm saying that .99 is getting a copy of all the data for which .124
> >> is the primary.  (If you are using RackUnawarePartitioner.  If you are
> >> using RackAware it is some other node.)
> >>
> >> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory <ra...@gmail.com> wrote:
> >> > ok, let me try and translate your answer ;)
> >> > Are you saying that the data that was left on the node is
> >> > non-primary-replicas of rows from the time before the move?
> >> > So this implies that when a node moves in the ring, it will affect
> >> > distribution of:
> >> > - new keys
> >> > - old keys primary node
> >> > -- but will not affect distribution of old keys non-primary replicas.
> >> > If so, still I don't understand something... I would expect even the
> >> > non-primary replicas of keys to be moved since if they don't, how
> would
> >> > they
> >> > be found? I mean upon reads the serving node should not care about
> >> > whether
> >> > the row is new or old, it should have a consistent and global mapping
> of
> >> > tokens. So I guess this ruins my theory...
> >> > What did you mean then? Is this deletions of non-primary replicated
> >> > data?
> >> > How does the replication factor affect the load on the moved host
> then?
> >> >
> >> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis <jb...@gmail.com>
> >> > wrote:
> >> >>
> >> >> well, there you are then.
> >> >>
> >> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory <ra...@gmail.com>
> wrote:
> >> >> > yes, replication factor = 2
> >> >> >
> >> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis <
> jbellis@gmail.com>
> >> >> > wrote:
> >> >> >>
> >> >> >> you have replication factor > 1 ?
> >> >> >>
> >> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <ra...@gmail.com>
> >> >> >> wrote:
> >> >> >> > I hope I understand nodetool cleanup correctly - it should clean
> >> >> >> > up
> >> >> >> > all
> >> >> >> > data
> >> >> >> > that does not (currently) belong to this node. If so, I think it
> >> >> >> > might
> >> >> >> > not
> >> >> >> > be working correctly.
> >> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below
> >> >> >> > 192.168.252.99Up         279.35 MB
> >> >> >> > 3544607988759775661076818827414252202
> >> >> >> >      |<--|
> >> >> >> > 192.168.252.124Up         167.23 MB
> >> >> >> > 56713727820156410577229101238628035242     |   ^
> >> >> >> > 192.168.252.125Up         82.91 MB
> >> >> >> >  85070591730234615865843651857942052863     v   |
> >> >> >> > 192.168.254.57Up         366.6 MB
> >> >> >> >  113427455640312821154458202477256070485    |   ^
> >> >> >> > 192.168.254.58Up         88.44 MB
> >> >> >> >  141784319550391026443072753096570088106    v   |
> >> >> >> > 192.168.254.59Up         88.45 MB
> >> >> >> >  170141183460469231731687303715884105727    |-->|
> >> >> >> > I wanted 124 to take all the load from 99. So I issued a move
> >> >> >> > command.
> >> >> >> > $ nodetool -h cass99 -p 9004 move
> >> >> >> > 56713727820156410577229101238628035243
> >> >> >> >
> >> >> >> > This command tells 99 to take the space b/w
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]
> >> >> >> > which is basically just one item in the token space, almost
> >> >> >> > nothing... I
> >> >> >> > wanted it to be very slim (just playing around).
> >> >> >> > So, next I get this:
> >> >> >> > 192.168.252.124Up         803.33 MB
> >> >> >> > 56713727820156410577229101238628035242     |<--|
> >> >> >> > 192.168.252.99Up         352.85 MB
> >> >> >> > 56713727820156410577229101238628035243     |   ^
> >> >> >> > 192.168.252.125Up         134.24 MB
> >> >> >> > 85070591730234615865843651857942052863     v   |
> >> >> >> > 192.168.254.57Up         676.41 MB
> >> >> >> > 113427455640312821154458202477256070485    |   ^
> >> >> >> > 192.168.254.58Up         99.74 MB
> >> >> >> >  141784319550391026443072753096570088106    v   |
> >> >> >> > 192.168.254.59Up         99.94 MB
> >> >> >> >  170141183460469231731687303715884105727    |-->|
> >> >> >> > The tokens are correct, but it seems that 99 still has a lot of
> >> >> >> > data.
> >> >> >> > Why?
> >> >> >> > OK, that might be b/c it didn't delete its moved data.
> >> >> >> > So next I issued a nodetool cleanup, which should have taken
> care
> >> >> >> > of
> >> >> >> > that.
> >> >> >> > Only that it didn't, the node 99 still has 352 MB of data. Why?
> >> >> >> > So, you know what, I waited for 1h. Still no good, data wasn't
> >> >> >> > cleaned
> >> >> >> > up.
> >> >> >> > I restarted the server. Still, data wasn't cleaned up... I
> issued
> >> >> >> > a
> >> >> >> > cleanup
> >> >> >> > again... still no good... what's up with this node?
> >> >> >> >
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Jonathan Ellis
> >> >> >> Project Chair, Apache Cassandra
> >> >> >> co-founder of Riptano, the source for professional Cassandra
> support
> >> >> >> http://riptano.com
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jonathan Ellis
> >> >> Project Chair, Apache Cassandra
> >> >> co-founder of Riptano, the source for professional Cassandra support
> >> >> http://riptano.com
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of Riptano, the source for professional Cassandra support
> >> http://riptano.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: nodetool cleanup isn't cleaning up?

Posted by Jonathan Ellis <jb...@gmail.com>.

Then the next step is to check StorageService.getRangeToEndpointMap via jmx

On Tue, Jun 1, 2010 at 11:56 AM, Ran Tavory <ra...@gmail.com> wrote:
> I'm using RackAwareStrategy. But it still doesn't make sense I think...
> let's see what did I miss...
> According to http://wiki.apache.org/cassandra/Operations
>
> RackAwareStrategy: replica 2 is placed in the first node along the ring the
> belongs in another data center than the first; the remaining N-2 replicas,
> if any, are placed on the first nodes along the ring in the same rack as the
> first
>
> 192.168.252.124Up        803.33 MB
> 56713727820156410577229101238628035242     |<--|
> 192.168.252.99Up         352.85 MB
> 56713727820156410577229101238628035243     |   ^
> 192.168.252.125Up        134.24 MB
> 85070591730234615865843651857942052863     v   |
> 192.168.254.57Up         676.41 MB
>  113427455640312821154458202477256070485    |   ^
> 192.168.254.58Up          99.74 MB
>  141784319550391026443072753096570088106    v   |
> 192.168.254.59Up          99.94 MB
>  170141183460469231731687303715884105727    |-->|
> Alright, so I made a mistake and didn't use the alternate-datacenter
> suggestion on the page so the first node of every DC is overloaded with
> replicas. However,  the current situation still doesn't make sense to me.
> .252.124 will be overloaded b/c it has the first token in the 252 dc.
> .254.57 will also be overloaded since it has the first token in the .254 DC.
> But for which node does 252.99 serve as a replicator? It's not the first in
> the DC and it's just one single token more than it's predecessor (which is
> in the same DC).
> On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> I'm saying that .99 is getting a copy of all the data for which .124
>> is the primary.  (If you are using RackUnawarePartitioner.  If you are
>> using RackAware it is some other node.)
>>
>> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory <ra...@gmail.com> wrote:
>> > ok, let me try and translate your answer ;)
>> > Are you saying that the data that was left on the node is
>> > non-primary-replicas of rows from the time before the move?
>> > So this implies that when a node moves in the ring, it will affect
>> > distribution of:
>> > - new keys
>> > - old keys primary node
>> > -- but will not affect distribution of old keys non-primary replicas.
>> > If so, still I don't understand something... I would expect even the
>> > non-primary replicas of keys to be moved since if they don't, how would
>> > they
>> > be found? I mean upon reads the serving node should not care about
>> > whether
>> > the row is new or old, it should have a consistent and global mapping of
>> > tokens. So I guess this ruins my theory...
>> > What did you mean then? Is this deletions of non-primary replicated
>> > data?
>> > How does the replication factor affect the load on the moved host then?
>> >
>> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis <jb...@gmail.com>
>> > wrote:
>> >>
>> >> well, there you are then.
>> >>
>> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory <ra...@gmail.com> wrote:
>> >> > yes, replication factor = 2
>> >> >
>> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis <jb...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> you have replication factor > 1 ?
>> >> >>
>> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <ra...@gmail.com>
>> >> >> wrote:
>> >> >> > I hope I understand nodetool cleanup correctly - it should clean
>> >> >> > up
>> >> >> > all
>> >> >> > data
>> >> >> > that does not (currently) belong to this node. If so, I think it
>> >> >> > might
>> >> >> > not
>> >> >> > be working correctly.
>> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below
>> >> >> > 192.168.252.99Up         279.35 MB
>> >> >> > 3544607988759775661076818827414252202
>> >> >> >      |<--|
>> >> >> > 192.168.252.124Up         167.23 MB
>> >> >> > 56713727820156410577229101238628035242     |   ^
>> >> >> > 192.168.252.125Up         82.91 MB
>> >> >> >  85070591730234615865843651857942052863     v   |
>> >> >> > 192.168.254.57Up         366.6 MB
>> >> >> >  113427455640312821154458202477256070485    |   ^
>> >> >> > 192.168.254.58Up         88.44 MB
>> >> >> >  141784319550391026443072753096570088106    v   |
>> >> >> > 192.168.254.59Up         88.45 MB
>> >> >> >  170141183460469231731687303715884105727    |-->|
>> >> >> > I wanted 124 to take all the load from 99. So I issued a move
>> >> >> > command.
>> >> >> > $ nodetool -h cass99 -p 9004 move
>> >> >> > 56713727820156410577229101238628035243
>> >> >> >
>> >> >> > This command tells 99 to take the space b/w
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]
>> >> >> > which is basically just one item in the token space, almost
>> >> >> > nothing... I
>> >> >> > wanted it to be very slim (just playing around).
>> >> >> > So, next I get this:
>> >> >> > 192.168.252.124Up         803.33 MB
>> >> >> > 56713727820156410577229101238628035242     |<--|
>> >> >> > 192.168.252.99Up         352.85 MB
>> >> >> > 56713727820156410577229101238628035243     |   ^
>> >> >> > 192.168.252.125Up         134.24 MB
>> >> >> > 85070591730234615865843651857942052863     v   |
>> >> >> > 192.168.254.57Up         676.41 MB
>> >> >> > 113427455640312821154458202477256070485    |   ^
>> >> >> > 192.168.254.58Up         99.74 MB
>> >> >> >  141784319550391026443072753096570088106    v   |
>> >> >> > 192.168.254.59Up         99.94 MB
>> >> >> >  170141183460469231731687303715884105727    |-->|
>> >> >> > The tokens are correct, but it seems that 99 still has a lot of
>> >> >> > data.
>> >> >> > Why?
>> >> >> > OK, that might be b/c it didn't delete its moved data.
>> >> >> > So next I issued a nodetool cleanup, which should have taken care
>> >> >> > of
>> >> >> > that.
>> >> >> > Only that it didn't, the node 99 still has 352 MB of data. Why?
>> >> >> > So, you know what, I waited for 1h. Still no good, data wasn't
>> >> >> > cleaned
>> >> >> > up.
>> >> >> > I restarted the server. Still, data wasn't cleaned up... I issued
>> >> >> > a
>> >> >> > cleanup
>> >> >> > again... still no good... what's up with this node?
>> >> >> >
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Jonathan Ellis
>> >> >> Project Chair, Apache Cassandra
>> >> >> co-founder of Riptano, the source for professional Cassandra support
>> >> >> http://riptano.com
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of Riptano, the source for professional Cassandra support
>> >> http://riptano.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: nodetool cleanup isn't cleaning up?

Posted by Ran Tavory <ra...@gmail.com>.

I'm using RackAwareStrategy. But it still doesn't make sense I think...
let's see what did I miss...
According to http://wiki.apache.org/cassandra/Operations


   -

   RackAwareStrategy: replica 2 is placed in the first node along the ring
   the belongs in *another* data center than the first; the remaining N-2
   replicas, if any, are placed on the first nodes along the ring in the *
   same* rack as the first



192.168.252.124Up        803.33 MB
56713727820156410577229101238628035242     |<--|
192.168.252.99Up         352.85 MB
56713727820156410577229101238628035243     |   ^
192.168.252.125Up        134.24 MB
85070591730234615865843651857942052863     v   |
192.168.254.57Up         676.41 MB
 113427455640312821154458202477256070485    |   ^
192.168.254.58Up          99.74 MB
 141784319550391026443072753096570088106    v   |
192.168.254.59Up          99.94 MB
 170141183460469231731687303715884105727    |-->|

Alright, so I made a mistake and didn't use the alternate-datacenter
suggestion on the page so the first node of every DC is overloaded with
replicas. However,  the current situation still doesn't make sense to me.
.252.124 will be overloaded b/c it has the first token in the 252 dc.
.254.57 will also be overloaded since it has the first token in the .254 DC.
But for which node does 252.99 serve as a replicator? It's not the first in
the DC and it's just one single token more than it's predecessor (which is
in the same DC).

On Tue, Jun 1, 2010 at 4:00 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> I'm saying that .99 is getting a copy of all the data for which .124
> is the primary.  (If you are using RackUnawarePartitioner.  If you are
> using RackAware it is some other node.)
>
> On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory <ra...@gmail.com> wrote:
> > ok, let me try and translate your answer ;)
> > Are you saying that the data that was left on the node is
> > non-primary-replicas of rows from the time before the move?
> > So this implies that when a node moves in the ring, it will affect
> > distribution of:
> > - new keys
> > - old keys primary node
> > -- but will not affect distribution of old keys non-primary replicas.
> > If so, still I don't understand something... I would expect even the
> > non-primary replicas of keys to be moved since if they don't, how would
> they
> > be found? I mean upon reads the serving node should not care about
> whether
> > the row is new or old, it should have a consistent and global mapping of
> > tokens. So I guess this ruins my theory...
> > What did you mean then? Is this deletions of non-primary replicated data?
> > How does the replication factor affect the load on the moved host then?
> >
> > On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> well, there you are then.
> >>
> >> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory <ra...@gmail.com> wrote:
> >> > yes, replication factor = 2
> >> >
> >> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis <jb...@gmail.com>
> >> > wrote:
> >> >>
> >> >> you have replication factor > 1 ?
> >> >>
> >> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <ra...@gmail.com>
> wrote:
> >> >> > I hope I understand nodetool cleanup correctly - it should clean up
> >> >> > all
> >> >> > data
> >> >> > that does not (currently) belong to this node. If so, I think it
> >> >> > might
> >> >> > not
> >> >> > be working correctly.
> >> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below
> >> >> > 192.168.252.99Up         279.35 MB
> >> >> > 3544607988759775661076818827414252202
> >> >> >      |<--|
> >> >> > 192.168.252.124Up         167.23 MB
> >> >> > 56713727820156410577229101238628035242     |   ^
> >> >> > 192.168.252.125Up         82.91 MB
> >> >> >  85070591730234615865843651857942052863     v   |
> >> >> > 192.168.254.57Up         366.6 MB
> >> >> >  113427455640312821154458202477256070485    |   ^
> >> >> > 192.168.254.58Up         88.44 MB
> >> >> >  141784319550391026443072753096570088106    v   |
> >> >> > 192.168.254.59Up         88.45 MB
> >> >> >  170141183460469231731687303715884105727    |-->|
> >> >> > I wanted 124 to take all the load from 99. So I issued a move
> >> >> > command.
> >> >> > $ nodetool -h cass99 -p 9004 move
> >> >> > 56713727820156410577229101238628035243
> >> >> >
> >> >> > This command tells 99 to take the space b/w
> >> >> >
> >> >> >
> >> >> >
> (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]
> >> >> > which is basically just one item in the token space, almost
> >> >> > nothing... I
> >> >> > wanted it to be very slim (just playing around).
> >> >> > So, next I get this:
> >> >> > 192.168.252.124Up         803.33 MB
> >> >> > 56713727820156410577229101238628035242     |<--|
> >> >> > 192.168.252.99Up         352.85 MB
> >> >> > 56713727820156410577229101238628035243     |   ^
> >> >> > 192.168.252.125Up         134.24 MB
> >> >> > 85070591730234615865843651857942052863     v   |
> >> >> > 192.168.254.57Up         676.41 MB
> >> >> > 113427455640312821154458202477256070485    |   ^
> >> >> > 192.168.254.58Up         99.74 MB
> >> >> >  141784319550391026443072753096570088106    v   |
> >> >> > 192.168.254.59Up         99.94 MB
> >> >> >  170141183460469231731687303715884105727    |-->|
> >> >> > The tokens are correct, but it seems that 99 still has a lot of
> data.
> >> >> > Why?
> >> >> > OK, that might be b/c it didn't delete its moved data.
> >> >> > So next I issued a nodetool cleanup, which should have taken care
> of
> >> >> > that.
> >> >> > Only that it didn't, the node 99 still has 352 MB of data. Why?
> >> >> > So, you know what, I waited for 1h. Still no good, data wasn't
> >> >> > cleaned
> >> >> > up.
> >> >> > I restarted the server. Still, data wasn't cleaned up... I issued a
> >> >> > cleanup
> >> >> > again... still no good... what's up with this node?
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Jonathan Ellis
> >> >> Project Chair, Apache Cassandra
> >> >> co-founder of Riptano, the source for professional Cassandra support
> >> >> http://riptano.com
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of Riptano, the source for professional Cassandra support
> >> http://riptano.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: nodetool cleanup isn't cleaning up?

Posted by Jonathan Ellis <jb...@gmail.com>.

I'm saying that .99 is getting a copy of all the data for which .124
is the primary.  (If you are using RackUnawarePartitioner.  If you are
using RackAware it is some other node.)

On Tue, Jun 1, 2010 at 1:25 AM, Ran Tavory <ra...@gmail.com> wrote:
> ok, let me try and translate your answer ;)
> Are you saying that the data that was left on the node is
> non-primary-replicas of rows from the time before the move?
> So this implies that when a node moves in the ring, it will affect
> distribution of:
> - new keys
> - old keys primary node
> -- but will not affect distribution of old keys non-primary replicas.
> If so, still I don't understand something... I would expect even the
> non-primary replicas of keys to be moved since if they don't, how would they
> be found? I mean upon reads the serving node should not care about whether
> the row is new or old, it should have a consistent and global mapping of
> tokens. So I guess this ruins my theory...
> What did you mean then? Is this deletions of non-primary replicated data?
> How does the replication factor affect the load on the moved host then?
>
> On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> well, there you are then.
>>
>> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory <ra...@gmail.com> wrote:
>> > yes, replication factor = 2
>> >
>> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis <jb...@gmail.com>
>> > wrote:
>> >>
>> >> you have replication factor > 1 ?
>> >>
>> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <ra...@gmail.com> wrote:
>> >> > I hope I understand nodetool cleanup correctly - it should clean up
>> >> > all
>> >> > data
>> >> > that does not (currently) belong to this node. If so, I think it
>> >> > might
>> >> > not
>> >> > be working correctly.
>> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below
>> >> > 192.168.252.99Up         279.35 MB
>> >> > 3544607988759775661076818827414252202
>> >> >      |<--|
>> >> > 192.168.252.124Up         167.23 MB
>> >> > 56713727820156410577229101238628035242     |   ^
>> >> > 192.168.252.125Up         82.91 MB
>> >> >  85070591730234615865843651857942052863     v   |
>> >> > 192.168.254.57Up         366.6 MB
>> >> >  113427455640312821154458202477256070485    |   ^
>> >> > 192.168.254.58Up         88.44 MB
>> >> >  141784319550391026443072753096570088106    v   |
>> >> > 192.168.254.59Up         88.45 MB
>> >> >  170141183460469231731687303715884105727    |-->|
>> >> > I wanted 124 to take all the load from 99. So I issued a move
>> >> > command.
>> >> > $ nodetool -h cass99 -p 9004 move
>> >> > 56713727820156410577229101238628035243
>> >> >
>> >> > This command tells 99 to take the space b/w
>> >> >
>> >> >
>> >> > (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]
>> >> > which is basically just one item in the token space, almost
>> >> > nothing... I
>> >> > wanted it to be very slim (just playing around).
>> >> > So, next I get this:
>> >> > 192.168.252.124Up         803.33 MB
>> >> > 56713727820156410577229101238628035242     |<--|
>> >> > 192.168.252.99Up         352.85 MB
>> >> > 56713727820156410577229101238628035243     |   ^
>> >> > 192.168.252.125Up         134.24 MB
>> >> > 85070591730234615865843651857942052863     v   |
>> >> > 192.168.254.57Up         676.41 MB
>> >> > 113427455640312821154458202477256070485    |   ^
>> >> > 192.168.254.58Up         99.74 MB
>> >> >  141784319550391026443072753096570088106    v   |
>> >> > 192.168.254.59Up         99.94 MB
>> >> >  170141183460469231731687303715884105727    |-->|
>> >> > The tokens are correct, but it seems that 99 still has a lot of data.
>> >> > Why?
>> >> > OK, that might be b/c it didn't delete its moved data.
>> >> > So next I issued a nodetool cleanup, which should have taken care of
>> >> > that.
>> >> > Only that it didn't, the node 99 still has 352 MB of data. Why?
>> >> > So, you know what, I waited for 1h. Still no good, data wasn't
>> >> > cleaned
>> >> > up.
>> >> > I restarted the server. Still, data wasn't cleaned up... I issued a
>> >> > cleanup
>> >> > again... still no good... what's up with this node?
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Jonathan Ellis
>> >> Project Chair, Apache Cassandra
>> >> co-founder of Riptano, the source for professional Cassandra support
>> >> http://riptano.com
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: nodetool cleanup isn't cleaning up?

Posted by Ran Tavory <ra...@gmail.com>.

ok, let me try and translate your answer ;)

Are you saying that the data that was left on the node is
non-primary-replicas of rows from the time before the move?
So this implies that when a node moves in the ring, it will affect
distribution of:
- new keys
- old keys primary node
-- but will not affect distribution of old keys non-primary replicas.

If so, still I don't understand something... I would expect even the
non-primary replicas of keys to be moved since if they don't, how would they
be found? I mean upon reads the serving node should not care about whether
the row is new or old, it should have a consistent and global mapping of
tokens. So I guess this ruins my theory...
What did you mean then? Is this deletions of non-primary replicated data?
How does the replication factor affect the load on the moved host then?

On Tue, Jun 1, 2010 at 1:19 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> well, there you are then.
>
> On Mon, May 31, 2010 at 2:34 PM, Ran Tavory <ra...@gmail.com> wrote:
> > yes, replication factor = 2
> >
> > On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> you have replication factor > 1 ?
> >>
> >> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <ra...@gmail.com> wrote:
> >> > I hope I understand nodetool cleanup correctly - it should clean up
> all
> >> > data
> >> > that does not (currently) belong to this node. If so, I think it might
> >> > not
> >> > be working correctly.
> >> > Look at nodes 192.168.252.124 and 192.168.252.99 below
> >> > 192.168.252.99Up         279.35 MB
> >> > 3544607988759775661076818827414252202
> >> >      |<--|
> >> > 192.168.252.124Up         167.23 MB
> >> > 56713727820156410577229101238628035242     |   ^
> >> > 192.168.252.125Up         82.91 MB
> >> >  85070591730234615865843651857942052863     v   |
> >> > 192.168.254.57Up         366.6 MB
> >> >  113427455640312821154458202477256070485    |   ^
> >> > 192.168.254.58Up         88.44 MB
> >> >  141784319550391026443072753096570088106    v   |
> >> > 192.168.254.59Up         88.45 MB
> >> >  170141183460469231731687303715884105727    |-->|
> >> > I wanted 124 to take all the load from 99. So I issued a move command.
> >> > $ nodetool -h cass99 -p 9004 move
> 56713727820156410577229101238628035243
> >> >
> >> > This command tells 99 to take the space b/w
> >> >
> >> >
> (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]
> >> > which is basically just one item in the token space, almost nothing...
> I
> >> > wanted it to be very slim (just playing around).
> >> > So, next I get this:
> >> > 192.168.252.124Up         803.33 MB
> >> > 56713727820156410577229101238628035242     |<--|
> >> > 192.168.252.99Up         352.85 MB
> >> > 56713727820156410577229101238628035243     |   ^
> >> > 192.168.252.125Up         134.24 MB
> >> > 85070591730234615865843651857942052863     v   |
> >> > 192.168.254.57Up         676.41 MB
> >> > 113427455640312821154458202477256070485    |   ^
> >> > 192.168.254.58Up         99.74 MB
> >> >  141784319550391026443072753096570088106    v   |
> >> > 192.168.254.59Up         99.94 MB
> >> >  170141183460469231731687303715884105727    |-->|
> >> > The tokens are correct, but it seems that 99 still has a lot of data.
> >> > Why?
> >> > OK, that might be b/c it didn't delete its moved data.
> >> > So next I issued a nodetool cleanup, which should have taken care of
> >> > that.
> >> > Only that it didn't, the node 99 still has 352 MB of data. Why?
> >> > So, you know what, I waited for 1h. Still no good, data wasn't cleaned
> >> > up.
> >> > I restarted the server. Still, data wasn't cleaned up... I issued a
> >> > cleanup
> >> > again... still no good... what's up with this node?
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Jonathan Ellis
> >> Project Chair, Apache Cassandra
> >> co-founder of Riptano, the source for professional Cassandra support
> >> http://riptano.com
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: nodetool cleanup isn't cleaning up?

Posted by Jonathan Ellis <jb...@gmail.com>.

well, there you are then.

On Mon, May 31, 2010 at 2:34 PM, Ran Tavory <ra...@gmail.com> wrote:
> yes, replication factor = 2
>
> On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> you have replication factor > 1 ?
>>
>> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <ra...@gmail.com> wrote:
>> > I hope I understand nodetool cleanup correctly - it should clean up all
>> > data
>> > that does not (currently) belong to this node. If so, I think it might
>> > not
>> > be working correctly.
>> > Look at nodes 192.168.252.124 and 192.168.252.99 below
>> > 192.168.252.99Up         279.35 MB
>> > 3544607988759775661076818827414252202
>> >      |<--|
>> > 192.168.252.124Up         167.23 MB
>> > 56713727820156410577229101238628035242     |   ^
>> > 192.168.252.125Up         82.91 MB
>> >  85070591730234615865843651857942052863     v   |
>> > 192.168.254.57Up         366.6 MB
>> >  113427455640312821154458202477256070485    |   ^
>> > 192.168.254.58Up         88.44 MB
>> >  141784319550391026443072753096570088106    v   |
>> > 192.168.254.59Up         88.45 MB
>> >  170141183460469231731687303715884105727    |-->|
>> > I wanted 124 to take all the load from 99. So I issued a move command.
>> > $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243
>> >
>> > This command tells 99 to take the space b/w
>> >
>> > (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]
>> > which is basically just one item in the token space, almost nothing... I
>> > wanted it to be very slim (just playing around).
>> > So, next I get this:
>> > 192.168.252.124Up         803.33 MB
>> > 56713727820156410577229101238628035242     |<--|
>> > 192.168.252.99Up         352.85 MB
>> > 56713727820156410577229101238628035243     |   ^
>> > 192.168.252.125Up         134.24 MB
>> > 85070591730234615865843651857942052863     v   |
>> > 192.168.254.57Up         676.41 MB
>> > 113427455640312821154458202477256070485    |   ^
>> > 192.168.254.58Up         99.74 MB
>> >  141784319550391026443072753096570088106    v   |
>> > 192.168.254.59Up         99.94 MB
>> >  170141183460469231731687303715884105727    |-->|
>> > The tokens are correct, but it seems that 99 still has a lot of data.
>> > Why?
>> > OK, that might be b/c it didn't delete its moved data.
>> > So next I issued a nodetool cleanup, which should have taken care of
>> > that.
>> > Only that it didn't, the node 99 still has 352 MB of data. Why?
>> > So, you know what, I waited for 1h. Still no good, data wasn't cleaned
>> > up.
>> > I restarted the server. Still, data wasn't cleaned up... I issued a
>> > cleanup
>> > again... still no good... what's up with this node?
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of Riptano, the source for professional Cassandra support
>> http://riptano.com
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: nodetool cleanup isn't cleaning up?

Posted by Ran Tavory <ra...@gmail.com>.

yes, replication factor = 2

On Mon, May 31, 2010 at 10:07 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> you have replication factor > 1 ?
>
> On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <ra...@gmail.com> wrote:
> > I hope I understand nodetool cleanup correctly - it should clean up all
> data
> > that does not (currently) belong to this node. If so, I think it might
> not
> > be working correctly.
> > Look at nodes 192.168.252.124 and 192.168.252.99 below
> > 192.168.252.99Up         279.35 MB
> 3544607988759775661076818827414252202
> >      |<--|
> > 192.168.252.124Up         167.23 MB
> > 56713727820156410577229101238628035242     |   ^
> > 192.168.252.125Up         82.91 MB
> >  85070591730234615865843651857942052863     v   |
> > 192.168.254.57Up         366.6 MB
> >  113427455640312821154458202477256070485    |   ^
> > 192.168.254.58Up         88.44 MB
> >  141784319550391026443072753096570088106    v   |
> > 192.168.254.59Up         88.45 MB
> >  170141183460469231731687303715884105727    |-->|
> > I wanted 124 to take all the load from 99. So I issued a move command.
> > $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243
> >
> > This command tells 99 to take the space b/w
> >
> (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]
> > which is basically just one item in the token space, almost nothing... I
> > wanted it to be very slim (just playing around).
> > So, next I get this:
> > 192.168.252.124Up         803.33 MB
> > 56713727820156410577229101238628035242     |<--|
> > 192.168.252.99Up         352.85 MB
> > 56713727820156410577229101238628035243     |   ^
> > 192.168.252.125Up         134.24 MB
> > 85070591730234615865843651857942052863     v   |
> > 192.168.254.57Up         676.41 MB
> > 113427455640312821154458202477256070485    |   ^
> > 192.168.254.58Up         99.74 MB
> >  141784319550391026443072753096570088106    v   |
> > 192.168.254.59Up         99.94 MB
> >  170141183460469231731687303715884105727    |-->|
> > The tokens are correct, but it seems that 99 still has a lot of data.
> Why?
> > OK, that might be b/c it didn't delete its moved data.
> > So next I issued a nodetool cleanup, which should have taken care of
> that.
> > Only that it didn't, the node 99 still has 352 MB of data. Why?
> > So, you know what, I waited for 1h. Still no good, data wasn't cleaned
> up.
> > I restarted the server. Still, data wasn't cleaned up... I issued a
> cleanup
> > again... still no good... what's up with this node?
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: nodetool cleanup isn't cleaning up?

Posted by Jonathan Ellis <jb...@gmail.com>.

you have replication factor > 1 ?

On Mon, May 31, 2010 at 7:23 AM, Ran Tavory <ra...@gmail.com> wrote:
> I hope I understand nodetool cleanup correctly - it should clean up all data
> that does not (currently) belong to this node. If so, I think it might not
> be working correctly.
> Look at nodes 192.168.252.124 and 192.168.252.99 below
> 192.168.252.99Up         279.35 MB     3544607988759775661076818827414252202
>      |<--|
> 192.168.252.124Up         167.23 MB
> 56713727820156410577229101238628035242     |   ^
> 192.168.252.125Up         82.91 MB
>  85070591730234615865843651857942052863     v   |
> 192.168.254.57Up         366.6 MB
>  113427455640312821154458202477256070485    |   ^
> 192.168.254.58Up         88.44 MB
>  141784319550391026443072753096570088106    v   |
> 192.168.254.59Up         88.45 MB
>  170141183460469231731687303715884105727    |-->|
> I wanted 124 to take all the load from 99. So I issued a move command.
> $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243
>
> This command tells 99 to take the space b/w
> (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]
> which is basically just one item in the token space, almost nothing... I
> wanted it to be very slim (just playing around).
> So, next I get this:
> 192.168.252.124Up         803.33 MB
> 56713727820156410577229101238628035242     |<--|
> 192.168.252.99Up         352.85 MB
> 56713727820156410577229101238628035243     |   ^
> 192.168.252.125Up         134.24 MB
> 85070591730234615865843651857942052863     v   |
> 192.168.254.57Up         676.41 MB
> 113427455640312821154458202477256070485    |   ^
> 192.168.254.58Up         99.74 MB
>  141784319550391026443072753096570088106    v   |
> 192.168.254.59Up         99.94 MB
>  170141183460469231731687303715884105727    |-->|
> The tokens are correct, but it seems that 99 still has a lot of data. Why?
> OK, that might be b/c it didn't delete its moved data.
> So next I issued a nodetool cleanup, which should have taken care of that.
> Only that it didn't, the node 99 still has 352 MB of data. Why?
> So, you know what, I waited for 1h. Still no good, data wasn't cleaned up.
> I restarted the server. Still, data wasn't cleaned up... I issued a cleanup
> again... still no good... what's up with this node?
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: nodetool cleanup isn't cleaning up?

Posted by Maxim Kramarenko <ma...@trackstudio.com>.

Hello!

I think (but not sure, please correct me if required), that after you 
change token, nodes just receive new data, but don't immediate deletes 
old one. It seems like "clean" will mark them as tombstone and it will 
be deleted when you run "compact" after GCGraceSeconds seconds.

On 31.05.2010 17:00, Ran Tavory wrote:
> Do you think it's the tombstones that take up the disk space?
> Shouldn't the tombstones be moved along with the data?
>
> On Mon, May 31, 2010 at 3:29 PM, Maxim Kramarenko
> <maximkr@trackstudio.com <ma...@trackstudio.com>> wrote:
>
>     Hello!
>
>     You likely need wait for GCGraceSeconds seconds or modify this param.
>
>     http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html
>     ===
>     Thus, a delete operation can't just wipe out all traces of the data
>     being removed immediately: if we did, and a replica did not receive
>     the delete operation, when it becomes available again it will treat
>     the replicas that did receive the delete as having missed a write
>     update, and repair them! So, instead of wiping out data on delete,
>     Cassandra replaces it with a special value called a tombstone. The
>     tombstone can then be propagated to replicas that missed the initial
>     remove request.
>     ...
>     Here, we defined a constant, GCGraceSeconds, and had each node track
>     tombstone age locally. Once it has aged past the constant, it can be
>     GC'd.
>     ===

Re: nodetool cleanup isn't cleaning up?

Posted by Ran Tavory <ra...@gmail.com>.

Do you think it's the tombstones that take up the disk space?
Shouldn't the tombstones be moved along with the data?

On Mon, May 31, 2010 at 3:29 PM, Maxim Kramarenko
<ma...@trackstudio.com>wrote:

> Hello!
>
> You likely need wait for GCGraceSeconds seconds or modify this param.
>
> http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html
> ===
> Thus, a delete operation can't just wipe out all traces of the data being
> removed immediately: if we did, and a replica did not receive the delete
> operation, when it becomes available again it will treat the replicas that
> did receive the delete as having missed a write update, and repair them! So,
> instead of wiping out data on delete, Cassandra replaces it with a special
> value called a tombstone. The tombstone can then be propagated to replicas
> that missed the initial remove request.
> ...
> Here, we defined a constant, GCGraceSeconds, and had each node track
> tombstone age locally. Once it has aged past the constant, it can be GC'd.
> ===
>
>
>
> On 31.05.2010 16:23, Ran Tavory wrote:
>
>> I hope I understand nodetool cleanup correctly - it should clean up all
>> data that does not (currently) belong to this node. If so, I think it
>> might not be working correctly.
>>
>> Look at nodes 192.168.252.124 and 192.168.252.99 below
>>
>> 192.168.252.99Up         279.35 MB
>> 3544607988759775661076818827414252202      |<--|
>> 192.168.252.124Up         167.23 MB
>> 56713727820156410577229101238628035242     |   ^
>> 192.168.252.125Up         82.91 MB
>>  85070591730234615865843651857942052863     v   |
>> 192.168.254.57Up         366.6 MB
>>  113427455640312821154458202477256070485    |   ^
>> 192.168.254.58Up         88.44 MB
>>  141784319550391026443072753096570088106    v   |
>> 192.168.254.59Up         88.45 MB
>>  170141183460469231731687303715884105727    |-->|
>>
>> I wanted 124 to take all the load from 99. So I issued a move command.
>>
>> $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243
>>
>> This command tells 99 to take the space b/w
>> (56713727820156410577229101238628035242,
>> 56713727820156410577229101238628035243]
>> which is basically just one item in the token space, almost nothing... I
>> wanted it to be very slim (just playing around).
>>
>> So, next I get this:
>>
>> 192.168.252.124Up         803.33 MB
>> 56713727820156410577229101238628035242     |<--|
>> 192.168.252.99Up         352.85 MB
>> 56713727820156410577229101238628035243     |   ^
>> 192.168.252.125Up         134.24 MB
>> 85070591730234615865843651857942052863     v   |
>> 192.168.254.57Up         676.41 MB
>> 113427455640312821154458202477256070485    |   ^
>> 192.168.254.58Up         99.74 MB
>>  141784319550391026443072753096570088106    v   |
>> 192.168.254.59Up         99.94 MB
>>  170141183460469231731687303715884105727    |-->|
>>
>> The tokens are correct, but it seems that 99 still has a lot of data.
>> Why? OK, that might be b/c it didn't delete its moved data.
>> So next I issued a nodetool cleanup, which should have taken care of
>> that. Only that it didn't, the node 99 still has 352 MB of data. Why?
>> So, you know what, I waited for 1h. Still no good, data wasn't cleaned up.
>> I restarted the server. Still, data wasn't cleaned up... I issued a
>> cleanup again... still no good... what's up with this node?
>>
>>
>>
> --
> Best regards,
>  Maxim                            mailto:maximkr@trackstudio.com
>
> LinkedIn Profile: http://www.linkedin.com/in/maximkr
> Google Talk/Jabber: maximkr@gmail.com
> ICQ number: 307863079
> Skype Chat: maxim.kramarenko
> Yahoo! Messenger: maxim_kramarenko
>

Re: nodetool cleanup isn't cleaning up?

Posted by Maxim Kramarenko <ma...@trackstudio.com>.

Hello!

You likely need wait for GCGraceSeconds seconds or modify this param.

http://spyced.blogspot.com/2010/02/distributed-deletes-in-cassandra.html
===
Thus, a delete operation can't just wipe out all traces of the data 
being removed immediately: if we did, and a replica did not receive the 
delete operation, when it becomes available again it will treat the 
replicas that did receive the delete as having missed a write update, 
and repair them! So, instead of wiping out data on delete, Cassandra 
replaces it with a special value called a tombstone. The tombstone can 
then be propagated to replicas that missed the initial remove request.
...
Here, we defined a constant, GCGraceSeconds, and had each node track 
tombstone age locally. Once it has aged past the constant, it can be GC'd.
===


On 31.05.2010 16:23, Ran Tavory wrote:
> I hope I understand nodetool cleanup correctly - it should clean up all
> data that does not (currently) belong to this node. If so, I think it
> might not be working correctly.
>
> Look at nodes 192.168.252.124 and 192.168.252.99 below
>
> 192.168.252.99Up         279.35 MB
> 3544607988759775661076818827414252202      |<--|
> 192.168.252.124Up         167.23 MB
> 56713727820156410577229101238628035242     |   ^
> 192.168.252.125Up         82.91 MB
>   85070591730234615865843651857942052863     v   |
> 192.168.254.57Up         366.6 MB
>   113427455640312821154458202477256070485    |   ^
> 192.168.254.58Up         88.44 MB
>   141784319550391026443072753096570088106    v   |
> 192.168.254.59Up         88.45 MB
>   170141183460469231731687303715884105727    |-->|
>
> I wanted 124 to take all the load from 99. So I issued a move command.
>
> $ nodetool -h cass99 -p 9004 move 56713727820156410577229101238628035243
>
> This command tells 99 to take the space b/w
> (56713727820156410577229101238628035242, 56713727820156410577229101238628035243]
> which is basically just one item in the token space, almost nothing... I
> wanted it to be very slim (just playing around).
>
> So, next I get this:
>
> 192.168.252.124Up         803.33 MB
> 56713727820156410577229101238628035242     |<--|
> 192.168.252.99Up         352.85 MB
> 56713727820156410577229101238628035243     |   ^
> 192.168.252.125Up         134.24 MB
> 85070591730234615865843651857942052863     v   |
> 192.168.254.57Up         676.41 MB
> 113427455640312821154458202477256070485    |   ^
> 192.168.254.58Up         99.74 MB
>   141784319550391026443072753096570088106    v   |
> 192.168.254.59Up         99.94 MB
>   170141183460469231731687303715884105727    |-->|
>
> The tokens are correct, but it seems that 99 still has a lot of data.
> Why? OK, that might be b/c it didn't delete its moved data.
> So next I issued a nodetool cleanup, which should have taken care of
> that. Only that it didn't, the node 99 still has 352 MB of data. Why?
> So, you know what, I waited for 1h. Still no good, data wasn't cleaned up.
> I restarted the server. Still, data wasn't cleaned up... I issued a
> cleanup again... still no good... what's up with this node?
>
>

-- 
Best regards,
  Maxim                            mailto:maximkr@trackstudio.com

LinkedIn Profile: http://www.linkedin.com/in/maximkr
Google Talk/Jabber: maximkr@gmail.com
ICQ number: 307863079
Skype Chat: maxim.kramarenko
Yahoo! Messenger: maxim_kramarenko