You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Anthony Molinaro <an...@alumni.caltech.edu> on 2010/04/16 01:10:44 UTC

Clarification on Ring operations in Cassandra 0.5.1

Hi,

  I have a cluster running on ec2, and would like to do some ring
management.  Specifically, I'd like to replace an existing node
without another node (I want to change the instance type).

  I was looking over http://wiki.apache.org/cassandra/Operations
and it seems like I could do something like.

1) shutdown cassandra on instance I want to replace
2) create a new instance, start cassandra with AutoBootstrap = true
3) run nodeprobe removetoken against the token of the instance I am
   replacing

Then according to the 'Handling failure' the new instance will "find the
appropriate position automatically".  However, it's not clear to me
if this means it will take the same range as the shutdown node or not,
because normally AutoBootstrap == true means it will take "half the keys
from the node with the most disk space used." (from the 'Bootstrap' section).

So will the process I describe above result in what I want, a new node
replacing an old one?

Also, if the new instance takes over the range of the old instance how
does removetoken know which instance to remove, does it remove the Down
instance?

Another hopefully minor question, if I bring up a new node with
AutoBootstrap = false, what happens?
Does it join the ring but without data and without token range?
Can I then 'nodeprobe move <token for range I want to take over>', and
achieve the same as step 2 above?

Thanks,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Clarification on Ring operations in Cassandra 0.5.1

Posted by gabriele renzi <rf...@gmail.com>.

On Fri, Apr 16, 2010 at 1:10 AM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> Hi,
>
>  I have a cluster running on ec2, and would like to do some ring
> management.  Specifically, I'd like to replace an existing node
> without another node (I want to change the instance type).


does maybe `nodetool move` do what you want?

Re: Clarification on Ring operations in Cassandra 0.5.1

Posted by Jonathan Ellis <jb...@gmail.com>.

On Wed, Apr 21, 2010 at 1:48 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> So why is Token - 1 better?  Doesn't that result in more data movement
> than PreviousTokenInRing + 1?

No, because a node is responsible for (previous token, own token].  So
if you introduce token T-1 before token T then the only keys the old
node will be responsible for would be one corresponding exactly to T.

>> You could use scp-then-repair if you can tolerate slightly out of date
>> data being served by the new machine until the repair finishes.
>
> So with scp-then-repair, what would my config look like?  Would I specify
> the InitialToken as the same as the old token, but have AutoBootstrap
> set to false?

Right.

Re: Clarification on Ring operations in Cassandra 0.5.1

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.

On Wed, Apr 21, 2010 at 12:05:07PM -0500, Jonathan Ellis wrote:
> On Wed, Apr 21, 2010 at 11:31 AM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> >
> > On Wed, Apr 21, 2010 at 11:08:19AM -0500, Jonathan Ellis wrote:
> >> Yes, that looks right, where "token really close" means "slightly less
> >> than" (more than would move it into a different node's range).
> >
> > Is it better to go slightly less than (say Token - 1), or slightly more than
> > the beginning of the range (PreviousTokenInRing + 1).  I was assuming the
> > latter in my earlier email, but you seem to be suggesting the former?
> 
> Right, the former.

So why is Token - 1 better?  Doesn't that result in more data movement
than PreviousTokenInRing + 1?

> > Right, I was mostly wondering if I could speed things up by scping the
> > sstables while the system was running (since they shouldn't be changing).
> > Then in quick succession removetoken and bootstrap with the old token.
> > Probably grasping at straws here :b
> 
> Nope, bootstrap ignores any local data.
> 
> You could use scp-then-repair if you can tolerate slightly out of date
> data being served by the new machine until the repair finishes.

So with scp-then-repair, what would my config look like?  Would I specify
the InitialToken as the same as the old token, but have AutoBootstrap
set to false?  I guess this is interesting to me because I could do something
where I migrate my data on a running server to an attached ebs, then
after it's synced, detach and re-attach to the new machine.

Anyway, thanks for discussing the possibilities,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

unsubscribe

Posted by Jennifer Huynh <je...@arrayent.com>.

Anyone know how to unsubscribe to the mailing list? I tried emailing the
server, user-unsubcribe@cassandra.apache.org, and had no luck.

Thanks in advance!!!

Re: unsubscribe

Posted by Jeremy Dunck <jd...@gmail.com>.

You have a typo: user-unsubscribe@cassandra.apache.org, not
user-unsubcribe@cassandra.apache.org.

:-)

On Wed, Apr 21, 2010 at 3:55 PM, Jennifer Huynh
<je...@arrayent.com> wrote:
> Anyone know how to unsubscribe to the mailing list? I tried emailing the
> server, user-unsubcribe@cassandra.apache.org, and had no luck.
>
> Thanks in advance!!!
>
>
>
>

Re: Clarification on Ring operations in Cassandra 0.5.1

Posted by Jonathan Ellis <jb...@gmail.com>.

On Wed, Apr 21, 2010 at 11:31 AM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
>
> On Wed, Apr 21, 2010 at 11:08:19AM -0500, Jonathan Ellis wrote:
>> Yes, that looks right, where "token really close" means "slightly less
>> than" (more than would move it into a different node's range).
>
> Is it better to go slightly less than (say Token - 1), or slightly more than
> the beginning of the range (PreviousTokenInRing + 1).  I was assuming the
> latter in my earlier email, but you seem to be suggesting the former?

Right, the former.

> Right, I was mostly wondering if I could speed things up by scping the
> sstables while the system was running (since they shouldn't be changing).
> Then in quick succession removetoken and bootstrap with the old token.
> Probably grasping at straws here :b

Nope, bootstrap ignores any local data.

You could use scp-then-repair if you can tolerate slightly out of date
data being served by the new machine until the repair finishes.

Re: Clarification on Ring operations in Cassandra 0.5.1

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.

On Wed, Apr 21, 2010 at 11:08:19AM -0500, Jonathan Ellis wrote:
> Yes, that looks right, where "token really close" means "slightly less
> than" (more than would move it into a different node's range).

Is it better to go slightly less than (say Token - 1), or slightly more than
the beginning of the range (PreviousTokenInRing + 1).  I was assuming the
latter in my earlier email, but you seem to be suggesting the former?

> You can't really migrate via scp since only one node with a given
> token can exist in the cluster at a time.

Right, I was mostly wondering if I could speed things up by scping the
sstables while the system was running (since they shouldn't be changing).
Then in quick succession removetoken and bootstrap with the old token.
Probably grasping at straws here :b

Thanks for the answers,

-Anthony

> On Wed, Apr 21, 2010 at 11:02 AM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> > Hi,
> >
> >  I'm still curious if I got the data movement right in this email from
> > before?  Anyone?  Also, anyone know if I can scp the data directory from
> > a node I want to replace to a new machine?  The cassandra streaming seems
> > much slower than scp.
> >
> > -Anthony
> >
> > On Mon, Apr 19, 2010 at 04:48:23PM -0700, Anthony Molinaro wrote:
> >>
> >> On Mon, Apr 19, 2010 at 03:28:26PM -0500, Jonathan Ellis wrote:
> >> > > Can I then 'nodeprobe move <token for range I want to take over>', and
> >> > > achieve the same as step 2 above?
> >> >
> >> > You can't have two nodes with the same token in the ring at once.  So,
> >> > you can removetoken the old node first, then bootstrap the new one
> >> > (just specify InitialToken in the config to avoid having it guess
> >> > one), or you can make it a 3 step process (bootstrap, remove, move) to
> >> > avoid transferring so much data around.
> >>
> >> So I'm still a little fuzzy for your 3 step case on why less data moves,
> >> but let me run through the two scenarios and see where we get.  Please
> >> correct me if I'm wrong on some point.
> >>
> >> Let say I have 3 nodes with random partitioner and rack unaware strategy.
> >> Which means I have something like
> >>
> >> Node  Size   Token  KeyRange (self + next in ring)
> >> ----  ----   -----  ------------------------------
> >> A     5 G      33    1 -> 66
> >> B     6 G      66       34 -> 0
> >> C     2 G       0          67 -> 33
> >>
> >> Now lets say Node B is giving us some problems, so we want to replace it
> >> with another node D.
> >>
> >> We've outlined 2 processes.
> >>
> >> In the first process you recommend
> >>
> >> 1. removetoken on node B
> >> 2. wait for data to move
> >> 3. add InitialToken of 66 and AutoBootstrap = true to node D storage-conf.xml
> >>    then start it
> >> 4. wait for data to move
> >>
> >> So when you do the removetoken, this will cause the following transfers
> >> at stage 2
> >>   Node A sends 34->66 to Node C
> >>   Node C sends 67->0  to Node A
> >> at stage 4
> >>   Node A sends 34->66 to Node D
> >>   Node C sends 67->0  to Node D
> >>
> >> In the second process I assume you pick a token really close to another token?
> >>
> >> 1. add InitialToken of 34 and AutoBootstrap to true to node D storage-conf.xml
> >>    then start it
> >> 2. wait for data to move
> >> 3. removetoken on node B
> >> 4. wait for data to move
> >> 5. movetoken on node D to 66
> >> 6. wait for data to move
> >>
> >> This results in the following moves
> >> at stage 2
> >>   Node A/B sends 33->34 to Node D (primary token range)
> >>   Node B sends 34->66 to Node D   (replica range)
> >> at stage 4
> >>   Node C sends 66->0 to Node D (replica range)
> >> at stage 6
> >>   No data movement as D already had 33->0
> >>
> >> So seems like you move all the data twice for process 1 and only a small
> >> portion twice for process 2 (which is what you said, so hopefully I've
> >> outlined correctly what is happening).  Does all that sound right?
> >>
> >> Once I've run bootstrap with the InitialToken value set in the config is
> >> it then ignored in subsequent restarts, and if so can I just remove it
> >> after that first time?
> >>
> >> Thanks,
> >>
> >> -Anthony
> >>
> >> --
> >> ------------------------------------------------------------------------
> >> Anthony Molinaro                           <an...@alumni.caltech.edu>
> >
> > --
> > ------------------------------------------------------------------------
> > Anthony Molinaro                           <an...@alumni.caltech.edu>
> >

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Clarification on Ring operations in Cassandra 0.5.1

Posted by Jonathan Ellis <jb...@gmail.com>.

Yes, that looks right, where "token really close" means "slightly less
than" (more than would move it into a different node's range).

You can't really migrate via scp since only one node with a given
token can exist in the cluster at a time.

-Jonathan

On Wed, Apr 21, 2010 at 11:02 AM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> Hi,
>
>  I'm still curious if I got the data movement right in this email from
> before?  Anyone?  Also, anyone know if I can scp the data directory from
> a node I want to replace to a new machine?  The cassandra streaming seems
> much slower than scp.
>
> -Anthony
>
> On Mon, Apr 19, 2010 at 04:48:23PM -0700, Anthony Molinaro wrote:
>>
>> On Mon, Apr 19, 2010 at 03:28:26PM -0500, Jonathan Ellis wrote:
>> > > Can I then 'nodeprobe move <token for range I want to take over>', and
>> > > achieve the same as step 2 above?
>> >
>> > You can't have two nodes with the same token in the ring at once.  So,
>> > you can removetoken the old node first, then bootstrap the new one
>> > (just specify InitialToken in the config to avoid having it guess
>> > one), or you can make it a 3 step process (bootstrap, remove, move) to
>> > avoid transferring so much data around.
>>
>> So I'm still a little fuzzy for your 3 step case on why less data moves,
>> but let me run through the two scenarios and see where we get.  Please
>> correct me if I'm wrong on some point.
>>
>> Let say I have 3 nodes with random partitioner and rack unaware strategy.
>> Which means I have something like
>>
>> Node  Size   Token  KeyRange (self + next in ring)
>> ----  ----   -----  ------------------------------
>> A     5 G      33    1 -> 66
>> B     6 G      66       34 -> 0
>> C     2 G       0          67 -> 33
>>
>> Now lets say Node B is giving us some problems, so we want to replace it
>> with another node D.
>>
>> We've outlined 2 processes.
>>
>> In the first process you recommend
>>
>> 1. removetoken on node B
>> 2. wait for data to move
>> 3. add InitialToken of 66 and AutoBootstrap = true to node D storage-conf.xml
>>    then start it
>> 4. wait for data to move
>>
>> So when you do the removetoken, this will cause the following transfers
>> at stage 2
>>   Node A sends 34->66 to Node C
>>   Node C sends 67->0  to Node A
>> at stage 4
>>   Node A sends 34->66 to Node D
>>   Node C sends 67->0  to Node D
>>
>> In the second process I assume you pick a token really close to another token?
>>
>> 1. add InitialToken of 34 and AutoBootstrap to true to node D storage-conf.xml
>>    then start it
>> 2. wait for data to move
>> 3. removetoken on node B
>> 4. wait for data to move
>> 5. movetoken on node D to 66
>> 6. wait for data to move
>>
>> This results in the following moves
>> at stage 2
>>   Node A/B sends 33->34 to Node D (primary token range)
>>   Node B sends 34->66 to Node D   (replica range)
>> at stage 4
>>   Node C sends 66->0 to Node D (replica range)
>> at stage 6
>>   No data movement as D already had 33->0
>>
>> So seems like you move all the data twice for process 1 and only a small
>> portion twice for process 2 (which is what you said, so hopefully I've
>> outlined correctly what is happening).  Does all that sound right?
>>
>> Once I've run bootstrap with the InitialToken value set in the config is
>> it then ignored in subsequent restarts, and if so can I just remove it
>> after that first time?
>>
>> Thanks,
>>
>> -Anthony
>>
>> --
>> ------------------------------------------------------------------------
>> Anthony Molinaro                           <an...@alumni.caltech.edu>
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro                           <an...@alumni.caltech.edu>
>

Re: Clarification on Ring operations in Cassandra 0.5.1

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.

Hi,

  I'm still curious if I got the data movement right in this email from 
before?  Anyone?  Also, anyone know if I can scp the data directory from
a node I want to replace to a new machine?  The cassandra streaming seems
much slower than scp.

-Anthony

On Mon, Apr 19, 2010 at 04:48:23PM -0700, Anthony Molinaro wrote:
> 
> On Mon, Apr 19, 2010 at 03:28:26PM -0500, Jonathan Ellis wrote:
> > > Can I then 'nodeprobe move <token for range I want to take over>', and
> > > achieve the same as step 2 above?
> > 
> > You can't have two nodes with the same token in the ring at once.  So,
> > you can removetoken the old node first, then bootstrap the new one
> > (just specify InitialToken in the config to avoid having it guess
> > one), or you can make it a 3 step process (bootstrap, remove, move) to
> > avoid transferring so much data around.
> 
> So I'm still a little fuzzy for your 3 step case on why less data moves,
> but let me run through the two scenarios and see where we get.  Please
> correct me if I'm wrong on some point.
> 
> Let say I have 3 nodes with random partitioner and rack unaware strategy.
> Which means I have something like
> 
> Node  Size   Token  KeyRange (self + next in ring)
> ----  ----   -----  ------------------------------
> A     5 G      33    1 -> 66
> B     6 G      66       34 -> 0
> C     2 G       0          67 -> 33
> 
> Now lets say Node B is giving us some problems, so we want to replace it
> with another node D.
> 
> We've outlined 2 processes.
> 
> In the first process you recommend
> 
> 1. removetoken on node B
> 2. wait for data to move
> 3. add InitialToken of 66 and AutoBootstrap = true to node D storage-conf.xml
>    then start it
> 4. wait for data to move
> 
> So when you do the removetoken, this will cause the following transfers
> at stage 2
>   Node A sends 34->66 to Node C
>   Node C sends 67->0  to Node A
> at stage 4
>   Node A sends 34->66 to Node D
>   Node C sends 67->0  to Node D
> 
> In the second process I assume you pick a token really close to another token?
> 
> 1. add InitialToken of 34 and AutoBootstrap to true to node D storage-conf.xml
>    then start it
> 2. wait for data to move
> 3. removetoken on node B
> 4. wait for data to move
> 5. movetoken on node D to 66
> 6. wait for data to move
> 
> This results in the following moves
> at stage 2
>   Node A/B sends 33->34 to Node D (primary token range)
>   Node B sends 34->66 to Node D   (replica range)
> at stage 4
>   Node C sends 66->0 to Node D (replica range)
> at stage 6
>   No data movement as D already had 33->0
> 
> So seems like you move all the data twice for process 1 and only a small
> portion twice for process 2 (which is what you said, so hopefully I've
> outlined correctly what is happening).  Does all that sound right?
> 
> Once I've run bootstrap with the InitialToken value set in the config is
> it then ignored in subsequent restarts, and if so can I just remove it
> after that first time?
> 
> Thanks,
> 
> -Anthony
> 
> -- 
> ------------------------------------------------------------------------
> Anthony Molinaro                           <an...@alumni.caltech.edu>

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Clarification on Ring operations in Cassandra 0.5.1

Posted by Schubert Zhang <zs...@gmail.com>.

You can have a look at org.apache.cassandra.service.StorageService
    public void initServer() throws IOException

1. If AutoBootstrap=false, it means the the node is bootstaped (not a new
node)
Usually, the first new node is set false.
(1) check the system table to find the saved token, if found use it,
otherwise,
(2) check config of InitialToken, if configured use it, otherwise,
(3) getRandomToken
Please refer
   org.apache.cassandra.service.StorageService
         public void initServer() throws IOException
and
    org.apache.cassandra.db.SystemTable
         public static synchronized StorageMetadata initMetadata() throws
IOException

2. If AutoBootstrap=true, it means the the node is a new node.
    Usually, the other new node set AutoBootstrap=true.
(1) If the seed include this node itself, go above 1. otherwise,
(2) If the node is already boodstraped (check system table....), go above 1.
otherwise,
(3) Get load information of other nodes via Gossip, wait long.
(4) If InitialTokenis configured, use it. otherwise,
(5) Find the node token with most heavy load.....

I my use case, I usually always configure InitialToken for new node for a
new cluster, then, I can get good load-balance. But when adding a new node
to a running cluster (with many data), I let cassandra to find the token via
load-checking.

Schubert


On Tue, Apr 20, 2010 at 7:48 AM, Anthony Molinaro <
anthonym@alumni.caltech.edu> wrote:

>
> On Mon, Apr 19, 2010 at 03:28:26PM -0500, Jonathan Ellis wrote:
> > > Can I then 'nodeprobe move <token for range I want to take over>', and
> > > achieve the same as step 2 above?
> >
> > You can't have two nodes with the same token in the ring at once.  So,
> > you can removetoken the old node first, then bootstrap the new one
> > (just specify InitialToken in the config to avoid having it guess
> > one), or you can make it a 3 step process (bootstrap, remove, move) to
> > avoid transferring so much data around.
>
> So I'm still a little fuzzy for your 3 step case on why less data moves,
> but let me run through the two scenarios and see where we get.  Please
> correct me if I'm wrong on some point.
>
> Let say I have 3 nodes with random partitioner and rack unaware strategy.
> Which means I have something like
>
> Node  Size   Token  KeyRange (self + next in ring)
> ----  ----   -----  ------------------------------
> A     5 G      33    1 -> 66
> B     6 G      66       34 -> 0
> C     2 G       0          67 -> 33
>
> Now lets say Node B is giving us some problems, so we want to replace it
> with another node D.
>
> We've outlined 2 processes.
>
> In the first process you recommend
>
> 1. removetoken on node B
> 2. wait for data to move
> 3. add InitialToken of 66 and AutoBootstrap = true to node D
> storage-conf.xml
>   then start it
> 4. wait for data to move
>
> So when you do the removetoken, this will cause the following transfers
> at stage 2
>  Node A sends 34->66 to Node C
>  Node C sends 67->0  to Node A
> at stage 4
>  Node A sends 34->66 to Node D
>  Node C sends 67->0  to Node D
>
> In the second process I assume you pick a token really close to another
> token?
>
> 1. add InitialToken of 34 and AutoBootstrap to true to node D
> storage-conf.xml
>   then start it
> 2. wait for data to move
> 3. removetoken on node B
> 4. wait for data to move
> 5. movetoken on node D to 66
> 6. wait for data to move
>
> This results in the following moves
> at stage 2
>  Node A/B sends 33->34 to Node D (primary token range)
>  Node B sends 34->66 to Node D   (replica range)
> at stage 4
>  Node C sends 66->0 to Node D (replica range)
> at stage 6
>  No data movement as D already had 33->0
>
> So seems like you move all the data twice for process 1 and only a small
> portion twice for process 2 (which is what you said, so hopefully I've
> outlined correctly what is happening).  Does all that sound right?
>
> Once I've run bootstrap with the InitialToken value set in the config is
> it then ignored in subsequent restarts, and if so can I just remove it
> after that first time?
>
> Thanks,
>
> -Anthony
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro                           <an...@alumni.caltech.edu>
>

Re: Clarification on Ring operations in Cassandra 0.5.1

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.

On Mon, Apr 19, 2010 at 03:28:26PM -0500, Jonathan Ellis wrote:
> > Can I then 'nodeprobe move <token for range I want to take over>', and
> > achieve the same as step 2 above?
> 
> You can't have two nodes with the same token in the ring at once.  So,
> you can removetoken the old node first, then bootstrap the new one
> (just specify InitialToken in the config to avoid having it guess
> one), or you can make it a 3 step process (bootstrap, remove, move) to
> avoid transferring so much data around.

So I'm still a little fuzzy for your 3 step case on why less data moves,
but let me run through the two scenarios and see where we get.  Please
correct me if I'm wrong on some point.

Let say I have 3 nodes with random partitioner and rack unaware strategy.
Which means I have something like

Node  Size   Token  KeyRange (self + next in ring)
----  ----   -----  ------------------------------
A     5 G      33    1 -> 66
B     6 G      66       34 -> 0
C     2 G       0          67 -> 33

Now lets say Node B is giving us some problems, so we want to replace it
with another node D.

We've outlined 2 processes.

In the first process you recommend

1. removetoken on node B
2. wait for data to move
3. add InitialToken of 66 and AutoBootstrap = true to node D storage-conf.xml
   then start it
4. wait for data to move

So when you do the removetoken, this will cause the following transfers
at stage 2
  Node A sends 34->66 to Node C
  Node C sends 67->0  to Node A
at stage 4
  Node A sends 34->66 to Node D
  Node C sends 67->0  to Node D

In the second process I assume you pick a token really close to another token?

1. add InitialToken of 34 and AutoBootstrap to true to node D storage-conf.xml
   then start it
2. wait for data to move
3. removetoken on node B
4. wait for data to move
5. movetoken on node D to 66
6. wait for data to move

This results in the following moves
at stage 2
  Node A/B sends 33->34 to Node D (primary token range)
  Node B sends 34->66 to Node D   (replica range)
at stage 4
  Node C sends 66->0 to Node D (replica range)
at stage 6
  No data movement as D already had 33->0

So seems like you move all the data twice for process 1 and only a small
portion twice for process 2 (which is what you said, so hopefully I've
outlined correctly what is happening).  Does all that sound right?

Once I've run bootstrap with the InitialToken value set in the config is
it then ignored in subsequent restarts, and if so can I just remove it
after that first time?

Thanks,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Clarification on Ring operations in Cassandra 0.5.1

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Apr 15, 2010 at 6:10 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> 1) shutdown cassandra on instance I want to replace
> 2) create a new instance, start cassandra with AutoBootstrap = true
> 3) run nodeprobe removetoken against the token of the instance I am
>   replacing
>
> Then according to the 'Handling failure' the new instance will "find the
> appropriate position automatically".  However, it's not clear to me
> if this means it will take the same range as the shutdown node or not,
> because normally AutoBootstrap == true means it will take "half the keys
> from the node with the most disk space used." (from the 'Bootstrap' section).
>
> So will the process I describe above result in what I want, a new node
> replacing an old one?

As you noted, it does not exactly replace the old one.  If you require
the token to be the same as the dead one, then you should manually
move the new node, after removing the dead one.

> how
> does removetoken know which instance to remove, does it remove the Down
> instance?

Tokens are unique per node.  (Those are the values you see in nodetool ring.)

> Another hopefully minor question, if I bring up a new node with
> AutoBootstrap = false, what happens?
> Does it join the ring but without data

Yes.

> and without token range?

No.  (This is why you should not do that.)

> Can I then 'nodeprobe move <token for range I want to take over>', and
> achieve the same as step 2 above?

You can't have two nodes with the same token in the ring at once.  So,
you can removetoken the old node first, then bootstrap the new one
(just specify InitialToken in the config to avoid having it guess
one), or you can make it a 3 step process (bootstrap, remove, move) to
avoid transferring so much data around.

-Jonathan