You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Arya Goudarzi <ag...@gaiaonline.com> on 2010/07/14 02:06:29 UTC
Re: nodetool loadbalance : Strerams Continue on Non Acceptance of New Token

Hi Gary,

Thanks for the reply. I tried this again today. Streams gets stuck, pls read my comment:

https://issues.apache.org/jira/browse/CASSANDRA-1221

-arya

----- Original Message -----
From: "Gary Dusbabek" <gd...@gmail.com>
To: user@cassandra.apache.org
Sent: Wednesday, June 23, 2010 5:40:02 AM
Subject: Re: nodetool loadbalance : Strerams Continue on Non Acceptance of New 	Token

On Tue, Jun 22, 2010 at 20:16, Arya Goudarzi <ag...@gaiaonline.com> wrote:
> Hi,
>
> Please confirm if this is an issue and should be reported or I am doing something wrong. I could not find anything relevant on JIRA:
>
> Playing with 0.7 nightly (today's build), I setup a 3 node cluster this way:
>
>  - Added one node;
>  - Loaded default schema with RF 1 from YAML using JMX;
>  - Loaded 2M keys using py_stress;
>  - Bootstrapped a second node;
>  - Cleaned up the first node;
>  - Bootstrapped a third node;
>  - Cleaned up the second node;
>
> I got the following ring:
>
> Address       Status     Load          Range                                      Ring
>                                       154293670372423273273390365393543806425
> 10.50.26.132  Up         518.63 MB     69164917636305877859094619660693892452     |<--|
> 10.50.26.134  Up         234.8 MB      111685517405103688771527967027648896391    |   |
> 10.50.26.133  Up         235.26 MB     154293670372423273273390365393543806425    |-->|
>
> Now I ran:
>
> nodetool --host 10.50.26.132 loadbalance
>
> It's been going for a while. I checked the streams
>
> nodetool --host 10.50.26.134 streams
> Mode: Normal
> Not sending any streams.
> Streaming from: /10.50.26.132
>   Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-3-Data.db/[(0,22206096), (22206096,27271682)]
>   Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-4-Data.db/[(0,15180462), (15180462,18656982)]
>   Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-5-Data.db/[(0,353139829), (353139829,433883659)]
>   Keyspace1: /var/lib/cassandra/data/Keyspace1/Standard1-tmp-d-6-Data.db/[(0,366336059), (366336059,450095320)]
>
> nodetool --host 10.50.26.132 streams
> Mode: Leaving: streaming data to other nodes
> Streaming to: /10.50.26.134
>   /var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), (366336059,450095320)]
> Not receiving any streams.
>
> These have been going for the past 2 hours.
>
> I see in the logs of the node with 134 IP address and I saw this:
>
> INFO [GOSSIP_STAGE:1] 2010-06-22 16:30:54,679 StorageService.java (line 603) Will not change my token ownership to /10.50.26.132

A node will give this message when it sees another node (usually for
the first time) that is trying to claim the same token but whose
startup time is much earlier (i.e., this isn't a token replacement).
It would follow that you would see this during a rebalance.

>
> So, to my understanding from wikis loadbalance supposed to decommission and re-bootstrap again by sending its tokens to other nodes and then bootstrap again. It's been stuck in streaming for the past 2 hours and the size of ring has not changed. The log in the first node says it has started streaming for the past hours:
>
> INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 72) Beginning transfer process to /10.50.26.134 for ranges (154293670372423273273390365393543806425,69164917636305877859094619660693892452]
>  INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,255 StreamOut.java (line 82) Flushing memtables for Keyspace1...
>  INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,266 StreamOut.java (line 128) Stream context metadata [/var/lib/cassandra/data/Keyspace1/Standard1-d-48-Data.db/[(0,366336059), (366336059,450095320)]] 1 sstables.
>  INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 135) Sending a stream initiate message to /10.50.26.134 ...
>  INFO [STREAM-STAGE:1] 2010-06-22 16:35:56,267 StreamOut.java (line 140) Waiting for transfer to /10.50.26.134 to complete
>  INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 359) LocationInfo has reached its threshold; switching in a fresh Memtable at CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1277249454413.log', position=720)
>  INFO [FLUSH-TIMER] 2010-06-22 17:36:53,370 ColumnFamilyStore.java (line 622) Enqueuing flush of Memtable(LocationInfo)@1637794189
>  INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,370 Memtable.java (line 149) Writing Memtable(LocationInfo)@1637794189
>  INFO [FLUSH-WRITER-POOL:1] 2010-06-22 17:36:53,528 Memtable.java (line 163) Completed flushing /var/lib/cassandra/data/system/LocationInfo-d-9-Data.db
>  INFO [MEMTABLE-POST-FLUSHER:1] 2010-06-22 17:36:53,529 ColumnFamilyStore.java (line 374) Discarding 1000
>
>
> Nothing more after this line.
>
> Am I doing something wrong?

If the output you get from `nodetool streams` isn't changing, then I'd
say we have a bug.  You're data sizes weren't that large--I'd expect 2
hrs would be more than enough time.

I've created https://issues.apache.org/jira/browse/CASSANDRA-1221 to
track this problem.

Gary.

>
>
> Best Regards,
> -Arya
>
>