You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Edmond Lau <ed...@ooyala.com> on 2009/10/28 05:10:52 UTC

on bootstrapping a node

I'd like to improve my mental model of how Cassandra bootstrapping
works.  My understanding is that bootstrapping is just an extra step
during a node's startup where the node copies data from neighboring
nodes that, according to its token, it should own; afterwards, the
node behaves like any other node.

If that's correct, I have a few follow-up questions:

- At what point does the new node get inserted into the hash ring so
that reads/writes for keys get directed to it?
- What are the semantics of bootstrapping a node that's been in the
cluster before and already has some data that's possibly outdated?
Should this work?  This might be useful if a node's been out of
commission for sufficiently long period of time.
- If we pick a poor set of initial tokens, would it be sensible to
modify the tokens on existing nodes and then restart them with
bootstrapping in order to rebalance?

I've also noticed that I can get my cassandra cluster into a weird
state via bootstrapping, where it stops accepting reads/writes.  I'm
on Cassandra 0.4.1.  A simple repro case is to start all 3 nodes of a
3 node cluster (replication factor of 2) using bootstrapping.  Getting
a key that I've inserted then leads to an IndexOutOfBoundsException.
Another IndexOutOfBoundsException was thrown later while flushing.

DEBUG [pool-1-thread-2] 2009-10-28 02:18:19,907 CassandraServer.java
(line 258) get
DEBUG [pool-1-thread-2] 2009-10-28 02:18:19,908 CassandraServer.java
(line 307) multiget
ERROR [pool-1-thread-2] 2009-10-28 02:18:19,912 Cassandra.java (line
647) Internal error processing get
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
        at java.util.ArrayList.get(ArrayList.java:322)
        at org.apache.cassandra.locator.RackUnawareStrategy.getStorageTokens(RackUnawareStrategy.java:99)
        at org.apache.cassandra.locator.RackUnawareStrategy.getReadStorageEndPoints(RackUnawareStrategy.java:68)
        at org.apache.cassandra.locator.RackUnawareStrategy.getReadStorageEndPoints(RackUnawareStrategy.java:45)
        at org.apache.cassandra.service.StorageService.getReadStorageEndPoints(StorageService.java:949)
        at org.apache.cassandra.service.StorageProxy.readProtocol(StorageProxy.java:296)
        at org.apache.cassandra.service.CassandraServer.readColumnFamily(CassandraServer.java:100)
        at org.apache.cassandra.service.CassandraServer.multigetColumns(CassandraServer.java:271)
        at org.apache.cassandra.service.CassandraServer.multigetInternal(CassandraServer.java:325)
        at org.apache.cassandra.service.CassandraServer.multiget(CassandraServer.java:308)
        at org.apache.cassandra.service.CassandraServer.get(CassandraServer.java:259)
        at org.apache.cassandra.service.Cassandra$Processor$get.process(Cassandra.java:639)
        at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:627)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
 INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-28 02:18:45,719
ColumnFamilyStore.java (line 369) LocationInfo has reached its
threshold; switching in a fresh Memtable
 INFO [PERIODIC-FLUSHER-POOL:1] 2009-10-28 02:18:45,720
ColumnFamilyStore.java (line 1178) Enqueuing flush of
Memtable(LocationInfo)@1048641931
 INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-28 02:18:45,721 Memtable.java
(line 186) Flushing Memtable(LocationInfo)@1048641931
DEBUG [COMMIT-LOG-WRITER] 2009-10-28 02:18:45,877 CommitLog.java (line
466) discard completed log segments for
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-125\
6696265813.log', position=423), column family 4. CFIDs are Keyspace1:
TableMetadata(Standard2: 1, Super1: 0, Standard1: 2, StandardByUUID1:
3, }), system: TableMetadata(Locatio\
nInfo: 4, HintsColumnFamily: 5, }), Analytics: TableMetadata(total: 6,
domain: 7, movie: 8, provider: 9, country: 10, }), }
DEBUG [COMMIT-LOG-WRITER] 2009-10-28 02:18:45,878 CommitLog.java (line
509) Marking replay position 423 on commit log
/var/lib/cassandra/commitlog/CommitLog-1256696265813.log
 INFO [MEMTABLE-FLUSHER-POOL:1] 2009-10-28 02:18:45,878 Memtable.java
(line 220) Completed flushing
/var/lib/cassandra/data/system/LocationInfo-1-Data.db
DEBUG [BOOT-STRAPPER:1] 2009-10-28 02:18:45,954 BootStrapper.java
(line 83) Exception was generated at : 10/28/2009 02:18:45 on thread
BOOT-STRAPPER:1
-1
java.lang.ArrayIndexOutOfBoundsException: -1
        at java.util.ArrayList.get(ArrayList.java:324)
        at org.apache.cassandra.service.StorageService.getAllRanges(StorageService.java:886)
        at org.apache.cassandra.dht.BootStrapper.getRangesWithSourceTarget(BootStrapper.java:98)
        at org.apache.cassandra.dht.BootStrapper.run(BootStrapper.java:73)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

Edmond

Re: on bootstrapping a node

Posted by Sandeep Tata <sa...@gmail.com>.
Does http://issues.apache.org/jira/browse/CASSANDRA-501 help with 0.4.1 ?

I haven't done much testing, but it seemed to fix the problem for a
simple 1 node cluster --> 2 node cluster test.

On Thu, Oct 29, 2009 at 4:42 PM, Edmond Lau <ed...@ooyala.com> wrote:
> I'm not able to bootstrap a new node on either 0.4.1 or trunk.  I
> started up a simple 2 node cluster with a replication factor of 2 and
> then bootstrapped a 3rd (using -b in 0.4.1 and AutoBootstrap in
> trunk).
>
> In 0.4.1, I do observe some writes going to the new node as expected,
> but then the BOOT-STRAPPER thread throws a NPE and the node never
> shows up in nodeprobe ring.  I believe this is fixed in CASSANDRA-425:
>
> DEBUG [BOOT-STRAPPER:1] 2009-10-29 22:56:41,272 BootStrapper.java
> (line 100) Total number of old ranges 2
> DEBUG [BOOT-STRAPPER:1] 2009-10-29 22:56:41,274 BootStrapper.java
> (line 83) Exception was generated at : 10/29/2009 22:56:41 on thread
> BOOT-STRAPPER:1
>
> java.lang.NullPointerException
>        at org.apache.cassandra.dht.Range.contains(Range.java:105)
>        at org.apache.cassandra.dht.LeaveJoinProtocolHelper.getRangeSplitRangeMapping(LeaveJoinProtocolHelper.java:72)
>        at org.apache.cassandra.dht.BootStrapper.getRangesWithSourceTarget(BootStrapper.java:105)
>        at org.apache.cassandra.dht.BootStrapper.run(BootStrapper.java:73)
>        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>
> On trunk, the 3rd node never receives any writes and just sits there
> doing nothing.  It also never shows up on the nodeprobe ring:
>
>  INFO [main] 2009-10-29 23:15:24,934 StorageService.java (line 264)
> Starting in bootstrap mode (first, sleeping to get load information)
>  INFO [GMFD:1] 2009-10-29 23:15:26,423 Gossiper.java (line 634) Node
> /172.16.130.130 has now joined.
> DEBUG [GMFD:1] 2009-10-29 23:15:26,424 StorageService.java (line 389)
> CHANGE IN STATE FOR /172.16.130.130 - has token
> 129730098012431089662630620415811546756
>  INFO [GMFD:1] 2009-10-29 23:15:26,426 Gossiper.java (line 634) Node
> /172.16.130.129 has now joined.
> DEBUG [GMFD:1] 2009-10-29 23:15:26,426 StorageService.java (line 389)
> CHANGE IN STATE FOR /172.16.130.129 - has token
> 30741330848943310678704865619376516001
> DEBUG [Timer-0] 2009-10-29 23:15:26,930 LoadDisseminator.java (line
> 39) Disseminating load info ...
> DEBUG [GMFD:1] 2009-10-29 23:18:39,451 StorageService.java (line 434)
> InetAddress /172.16.130.130 just recovered from a partition. Sending
> hinted data.
> DEBUG [HINTED-HANDOFF-POOL:1] 2009-10-29 23:18:39,454
> HintedHandOffManager.java (line 186) Started hinted handoff for
> endPoint /172.16.130.130
> DEBUG [HINTED-HANDOFF-POOL:1] 2009-10-29 23:18:39,456
> HintedHandOffManager.java (line 225) Finished hinted handoff for
> endpoint /172.16.130.130
> DEBUG [GMFD:1] 2009-10-29 23:18:39,954 StorageService.java (line 434)
> InetAddress /172.16.130.129 just recovered from a partition. Sending
> hinted data.
> DEBUG [HINTED-HANDOFF-POOL:1] 2009-10-29 23:18:39,955
> HintedHandOffManager.java (line 186) Started hinted handoff for
> endPoint /172.16.130.129
> DEBUG [HINTED-HANDOFF-POOL:1] 2009-10-29 23:18:39,956
> HintedHandOffManager.java (line 225) Finished hinted handoff for
> endpoint /172.16.130.129
>
> Bootstrapping the 3rd node after manually giving it an initial token
> led to an AssertionError:
>
>  INFO [main] 2009-10-29 23:25:11,720 SystemTable.java (line 125) Saved
> Token not found. Using 0
> DEBUG [main] 2009-10-29 23:25:11,878 MessagingService.java (line 203)
> Starting to listen on v31.vv.prod.ooyala.com/172.16.130.131
>  INFO [main] 2009-10-29 23:25:11,933 StorageService.java (line 264)
> Starting in bootstrap mode (first, sleeping to get load information)
>  INFO [GMFD:1] 2009-10-29 23:25:13,679 Gossiper.java (line 634) Node
> /172.16.130.130 has now joined.
> DEBUG [GMFD:1] 2009-10-29 23:25:13,680 StorageService.java (line 389)
> CHANGE IN STATE FOR /172.16.130.130 - has token
> 50846833567878089067494666696176925951
>  INFO [GMFD:1] 2009-10-29 23:25:13,682 Gossiper.java (line 634) Node
> /172.16.130.129 has now joined.
> DEBUG [GMFD:1] 2009-10-29 23:25:13,682 StorageService.java (line 389)
> CHANGE IN STATE FOR /172.16.130.129 - has token
> 44233547425983959380881840716972243602
> DEBUG [Timer-0] 2009-10-29 23:25:13,929 LoadDisseminator.java (line
> 39) Disseminating load info ...
> ERROR [main] 2009-10-29 23:25:43,754 CassandraDaemon.java (line 184)
> Exception encountered during startup.
> java.lang.AssertionError
>        at org.apache.cassandra.dht.BootStrapper.<init>(BootStrapper.java:84)
>        at org.apache.cassandra.service.StorageService.start(StorageService.java:267)
>        at org.apache.cassandra.service.CassandraServer.start(CassandraServer.java:72)
>        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:94)
>        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:166)
>
> Thoughts?
>
> Edmond
>
> On Wed, Oct 28, 2009 at 2:24 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> On Wed, Oct 28, 2009 at 1:15 PM, Edmond Lau <ed...@ooyala.com> wrote:
>>> Sounds reasonable.  Until CASSANDRA-435 is complete, there's no way
>>> currently to take down a node and have it be removed from the list of
>>> nodes that's responsible for the data in its token range, correct?
>>> All other nodes will just assume that it's temporarily unavailable?
>>
>> Right.
>>
>>> Assume that we had the ability to permanently remove a node.  Would
>>> modifying the token on an existing node and restarting it with
>>> bootstrapping somehow be incorrect, or merely not performant b/c we'll
>>> be performing lazy repair on most reads until the node is up to date?
>>
>> If you permanently remove a node, wipe its data directory, and restart
>> it, it's effectively a new node, so everything works fine.  If you
>> don't wipe its data directory it won't bootstrap (and it will ignore a
>> new token in the configuration file in favor of the one it stored in
>> the system table) since it will say "hey, I must have crashed and
>> restarted.  Here I am again guys!"
>>
>> Bootstrap is for new nodes.  Don't try to be too clever. :)
>>
>>> if I wanted to
>>> migrate my cluster to a completely new set of machines.  I would then
>>> bootstrap all the new nodes in the new cluster, and then decommission
>>> my old nodes one by one (assuming
>>> https://issues.apache.org/jira/browse/CASSANDRA-435 was done).  After
>>> the migration, all my nodes would've been bootstrapped.
>>
>> Sure.
>>
>> -Jonathan
>>
>

Re: on bootstrapping a node

Posted by Edmond Lau <ed...@ooyala.com>.
I'm not able to bootstrap a new node on either 0.4.1 or trunk.  I
started up a simple 2 node cluster with a replication factor of 2 and
then bootstrapped a 3rd (using -b in 0.4.1 and AutoBootstrap in
trunk).

In 0.4.1, I do observe some writes going to the new node as expected,
but then the BOOT-STRAPPER thread throws a NPE and the node never
shows up in nodeprobe ring.  I believe this is fixed in CASSANDRA-425:

DEBUG [BOOT-STRAPPER:1] 2009-10-29 22:56:41,272 BootStrapper.java
(line 100) Total number of old ranges 2
DEBUG [BOOT-STRAPPER:1] 2009-10-29 22:56:41,274 BootStrapper.java
(line 83) Exception was generated at : 10/29/2009 22:56:41 on thread
BOOT-STRAPPER:1

java.lang.NullPointerException
        at org.apache.cassandra.dht.Range.contains(Range.java:105)
        at org.apache.cassandra.dht.LeaveJoinProtocolHelper.getRangeSplitRangeMapping(LeaveJoinProtocolHelper.java:72)
        at org.apache.cassandra.dht.BootStrapper.getRangesWithSourceTarget(BootStrapper.java:105)
        at org.apache.cassandra.dht.BootStrapper.run(BootStrapper.java:73)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

On trunk, the 3rd node never receives any writes and just sits there
doing nothing.  It also never shows up on the nodeprobe ring:

 INFO [main] 2009-10-29 23:15:24,934 StorageService.java (line 264)
Starting in bootstrap mode (first, sleeping to get load information)
 INFO [GMFD:1] 2009-10-29 23:15:26,423 Gossiper.java (line 634) Node
/172.16.130.130 has now joined.
DEBUG [GMFD:1] 2009-10-29 23:15:26,424 StorageService.java (line 389)
CHANGE IN STATE FOR /172.16.130.130 - has token
129730098012431089662630620415811546756
 INFO [GMFD:1] 2009-10-29 23:15:26,426 Gossiper.java (line 634) Node
/172.16.130.129 has now joined.
DEBUG [GMFD:1] 2009-10-29 23:15:26,426 StorageService.java (line 389)
CHANGE IN STATE FOR /172.16.130.129 - has token
30741330848943310678704865619376516001
DEBUG [Timer-0] 2009-10-29 23:15:26,930 LoadDisseminator.java (line
39) Disseminating load info ...
DEBUG [GMFD:1] 2009-10-29 23:18:39,451 StorageService.java (line 434)
InetAddress /172.16.130.130 just recovered from a partition. Sending
hinted data.
DEBUG [HINTED-HANDOFF-POOL:1] 2009-10-29 23:18:39,454
HintedHandOffManager.java (line 186) Started hinted handoff for
endPoint /172.16.130.130
DEBUG [HINTED-HANDOFF-POOL:1] 2009-10-29 23:18:39,456
HintedHandOffManager.java (line 225) Finished hinted handoff for
endpoint /172.16.130.130
DEBUG [GMFD:1] 2009-10-29 23:18:39,954 StorageService.java (line 434)
InetAddress /172.16.130.129 just recovered from a partition. Sending
hinted data.
DEBUG [HINTED-HANDOFF-POOL:1] 2009-10-29 23:18:39,955
HintedHandOffManager.java (line 186) Started hinted handoff for
endPoint /172.16.130.129
DEBUG [HINTED-HANDOFF-POOL:1] 2009-10-29 23:18:39,956
HintedHandOffManager.java (line 225) Finished hinted handoff for
endpoint /172.16.130.129

Bootstrapping the 3rd node after manually giving it an initial token
led to an AssertionError:

 INFO [main] 2009-10-29 23:25:11,720 SystemTable.java (line 125) Saved
Token not found. Using 0
DEBUG [main] 2009-10-29 23:25:11,878 MessagingService.java (line 203)
Starting to listen on v31.vv.prod.ooyala.com/172.16.130.131
 INFO [main] 2009-10-29 23:25:11,933 StorageService.java (line 264)
Starting in bootstrap mode (first, sleeping to get load information)
 INFO [GMFD:1] 2009-10-29 23:25:13,679 Gossiper.java (line 634) Node
/172.16.130.130 has now joined.
DEBUG [GMFD:1] 2009-10-29 23:25:13,680 StorageService.java (line 389)
CHANGE IN STATE FOR /172.16.130.130 - has token
50846833567878089067494666696176925951
 INFO [GMFD:1] 2009-10-29 23:25:13,682 Gossiper.java (line 634) Node
/172.16.130.129 has now joined.
DEBUG [GMFD:1] 2009-10-29 23:25:13,682 StorageService.java (line 389)
CHANGE IN STATE FOR /172.16.130.129 - has token
44233547425983959380881840716972243602
DEBUG [Timer-0] 2009-10-29 23:25:13,929 LoadDisseminator.java (line
39) Disseminating load info ...
ERROR [main] 2009-10-29 23:25:43,754 CassandraDaemon.java (line 184)
Exception encountered during startup.
java.lang.AssertionError
        at org.apache.cassandra.dht.BootStrapper.<init>(BootStrapper.java:84)
        at org.apache.cassandra.service.StorageService.start(StorageService.java:267)
        at org.apache.cassandra.service.CassandraServer.start(CassandraServer.java:72)
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:94)
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:166)

Thoughts?

Edmond

On Wed, Oct 28, 2009 at 2:24 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Wed, Oct 28, 2009 at 1:15 PM, Edmond Lau <ed...@ooyala.com> wrote:
>> Sounds reasonable.  Until CASSANDRA-435 is complete, there's no way
>> currently to take down a node and have it be removed from the list of
>> nodes that's responsible for the data in its token range, correct?
>> All other nodes will just assume that it's temporarily unavailable?
>
> Right.
>
>> Assume that we had the ability to permanently remove a node.  Would
>> modifying the token on an existing node and restarting it with
>> bootstrapping somehow be incorrect, or merely not performant b/c we'll
>> be performing lazy repair on most reads until the node is up to date?
>
> If you permanently remove a node, wipe its data directory, and restart
> it, it's effectively a new node, so everything works fine.  If you
> don't wipe its data directory it won't bootstrap (and it will ignore a
> new token in the configuration file in favor of the one it stored in
> the system table) since it will say "hey, I must have crashed and
> restarted.  Here I am again guys!"
>
> Bootstrap is for new nodes.  Don't try to be too clever. :)
>
>> if I wanted to
>> migrate my cluster to a completely new set of machines.  I would then
>> bootstrap all the new nodes in the new cluster, and then decommission
>> my old nodes one by one (assuming
>> https://issues.apache.org/jira/browse/CASSANDRA-435 was done).  After
>> the migration, all my nodes would've been bootstrapped.
>
> Sure.
>
> -Jonathan
>

Re: on bootstrapping a node

Posted by Jonathan Ellis <jb...@gmail.com>.
On Wed, Oct 28, 2009 at 1:15 PM, Edmond Lau <ed...@ooyala.com> wrote:
> Sounds reasonable.  Until CASSANDRA-435 is complete, there's no way
> currently to take down a node and have it be removed from the list of
> nodes that's responsible for the data in its token range, correct?
> All other nodes will just assume that it's temporarily unavailable?

Right.

> Assume that we had the ability to permanently remove a node.  Would
> modifying the token on an existing node and restarting it with
> bootstrapping somehow be incorrect, or merely not performant b/c we'll
> be performing lazy repair on most reads until the node is up to date?

If you permanently remove a node, wipe its data directory, and restart
it, it's effectively a new node, so everything works fine.  If you
don't wipe its data directory it won't bootstrap (and it will ignore a
new token in the configuration file in favor of the one it stored in
the system table) since it will say "hey, I must have crashed and
restarted.  Here I am again guys!"

Bootstrap is for new nodes.  Don't try to be too clever. :)

> if I wanted to
> migrate my cluster to a completely new set of machines.  I would then
> bootstrap all the new nodes in the new cluster, and then decommission
> my old nodes one by one (assuming
> https://issues.apache.org/jira/browse/CASSANDRA-435 was done).  After
> the migration, all my nodes would've been bootstrapped.

Sure.

-Jonathan

Re: on bootstrapping a node

Posted by Edmond Lau <ed...@ooyala.com>.
On Tue, Oct 27, 2009 at 10:02 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> On Tue, Oct 27, 2009 at 10:10 PM, Edmond Lau <ed...@ooyala.com> wrote:
>> I'd like to improve my mental model of how Cassandra bootstrapping
>> works.  My understanding is that bootstrapping is just an extra step
>> during a node's startup where the node copies data from neighboring
>> nodes that, according to its token, it should own; afterwards, the
>> node behaves like any other node.
>>
>> If that's correct, I have a few follow-up questions:
>>
>> - At what point does the new node get inserted into the hash ring so
>> that reads/writes for keys get directed to it?
>
> It starts accepting reads when bootstrap is complete, but during
> bootstrap it will also be forwarded writes so it stays up-to-date.

Got it - good to know.

>
>> - What are the semantics of bootstrapping a node that's been in the
>> cluster before and already has some data that's possibly outdated?
>> Should this work?  This might be useful if a node's been out of
>> commission for sufficiently long period of time.
>
> Cluster membership is intended to be relatively stable.  When removing
> nodes is supported (see
> https://issues.apache.org/jira/browse/CASSANDRA-435), it won't be
> worth special-casing the case of "new node, that has some data locally
> that is arbitrarily out of date."  So you will want to wipe its data
> directories and re-bootstrap it.

Sounds reasonable.  Until CASSANDRA-435 is complete, there's no way
currently to take down a node and have it be removed from the list of
nodes that's responsible for the data in its token range, correct?
All other nodes will just assume that it's temporarily unavailable?

>
>> - If we pick a poor set of initial tokens, would it be sensible to
>> modify the tokens on existing nodes and then restart them with
>> bootstrapping in order to rebalance?
>
> Not until https://issues.apache.org/jira/browse/CASSANDRA-193 is done
> so you can fix the damage you've done to your replicas.

Assume that we had the ability to permanently remove a node.  Would
modifying the token on an existing node and restarting it with
bootstrapping somehow be incorrect, or merely not performant b/c we'll
be performing lazy repair on most reads until the node is up to date?

>
>> I've also noticed that I can get my cassandra cluster into a weird
>> state via bootstrapping, where it stops accepting reads/writes.  I'm
>> on Cassandra 0.4.1.  A simple repro case is to start all 3 nodes of a
>> 3 node cluster (replication factor of 2) using bootstrapping.  Getting
>> a key that I've inserted then leads to an IndexOutOfBoundsException.
>> Another IndexOutOfBoundsException was thrown later while flushing.
>
> Starting all nodes in bootstrap mode is not a supported operation.
> Don't do that.  (Bootstrap is much more automatic in trunk, and fixes
> this specific problem as well as others.)

The unsupported operation is that all nodes in the cluster are
simultaneously in bootstrap mode, and not that all nodes in the
cluster were bootstrapped at some point in time, right?  A reasonable
scenario that could cause the second situation would be if I wanted to
migrate my cluster to a completely new set of machines.  I would then
bootstrap all the new nodes in the new cluster, and then decommission
my old nodes one by one (assuming
https://issues.apache.org/jira/browse/CASSANDRA-435 was done).  After
the migration, all my nodes would've been bootstrapped.

Thanks for the info,
Edmond

>
> -Jonathan
>

Re: on bootstrapping a node

Posted by Jonathan Ellis <jb...@gmail.com>.
On Tue, Oct 27, 2009 at 10:10 PM, Edmond Lau <ed...@ooyala.com> wrote:
> I'd like to improve my mental model of how Cassandra bootstrapping
> works.  My understanding is that bootstrapping is just an extra step
> during a node's startup where the node copies data from neighboring
> nodes that, according to its token, it should own; afterwards, the
> node behaves like any other node.
>
> If that's correct, I have a few follow-up questions:
>
> - At what point does the new node get inserted into the hash ring so
> that reads/writes for keys get directed to it?

It starts accepting reads when bootstrap is complete, but during
bootstrap it will also be forwarded writes so it stays up-to-date.

> - What are the semantics of bootstrapping a node that's been in the
> cluster before and already has some data that's possibly outdated?
> Should this work?  This might be useful if a node's been out of
> commission for sufficiently long period of time.

Cluster membership is intended to be relatively stable.  When removing
nodes is supported (see
https://issues.apache.org/jira/browse/CASSANDRA-435), it won't be
worth special-casing the case of "new node, that has some data locally
that is arbitrarily out of date."  So you will want to wipe its data
directories and re-bootstrap it.

> - If we pick a poor set of initial tokens, would it be sensible to
> modify the tokens on existing nodes and then restart them with
> bootstrapping in order to rebalance?

Not until https://issues.apache.org/jira/browse/CASSANDRA-193 is done
so you can fix the damage you've done to your replicas.

> I've also noticed that I can get my cassandra cluster into a weird
> state via bootstrapping, where it stops accepting reads/writes.  I'm
> on Cassandra 0.4.1.  A simple repro case is to start all 3 nodes of a
> 3 node cluster (replication factor of 2) using bootstrapping.  Getting
> a key that I've inserted then leads to an IndexOutOfBoundsException.
> Another IndexOutOfBoundsException was thrown later while flushing.

Starting all nodes in bootstrap mode is not a supported operation.
Don't do that.  (Bootstrap is much more automatic in trunk, and fixes
this specific problem as well as others.)

-Jonathan