You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Igor Katkov <ik...@gmail.com> on 2009/10/01 18:26:45 UTC

distributing tokens equally along the key distribution space

Hi,

Question#1:
How to manually select tokens to force equal spacing of tokens around the
hash space?
If RandomPartitioner is used a token is a BigInteger, so there are no [0,
Max value] interval to select token values from.
If everything is left to defaults, a token is a random number (hash of GUID)
so these 10 generated tokens will not be evenly spaced on the ring.
Suppose I have X nodes, what would be correct token values?

Question#2:
Let's assume that #1 was resolved somehow and key distribution is more or
less even.
A new node "C" joins the cluster. It's token falls somewhere between two
other tokens on the ring (from nodes "A" and "B" clockwise-ordered). From
now on "C" is responsible for a portion of data that used to exclusively
belong to "B".
a. Cassandra v.0.4 will not automatically transfer this data to "C" will it?

b. Do all reads to these keys fail?
c. What happens with the data reference by these keys on "B"? It will never
be accessed there, therefor it becomes garbage. Since there are to GC will
it stick forever?
d. What happens to replicas of these keys?

Re: distributing tokens equally along the key distribution space

Posted by Jonathan Ellis <jb...@gmail.com>.

yes

On Thu, Oct 1, 2009 at 12:49 PM, Igor Katkov <ik...@gmail.com> wrote:
> I see, so to make cluster always balanced (data-wise) number of nodes should
> be doubled each time.
> I see some activity in JIAR regarding load-balancing for v.0.5
> Does it target the same thing? transferring data from node to node and
> appropriately modifying tokens?
>
> On Thu, Oct 1, 2009 at 1:42 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> You basically have two options.  You can wipe your data, change the
>> tokens, and reload things, or you can add new nodes with -b to
>> rebalance things that way.
>>
>> On Thu, Oct 1, 2009 at 12:34 PM, Igor Katkov <ik...@gmail.com> wrote:
>> > OK, so I don't need to use tokenupdater, what are the steps to rebalance
>> > data around the circle?
>> >
>> > In my test example (see below), I have A, D, B and C (clockwise) where
>> > A holds 1/3 of the data
>> > D - 1/6
>> > B - 1/6
>> > C - 1/3
>> > I'm willing to change tokens manually, it's all right.
>> > How do I tell all nodes to move data around in version 0.4? Do I change
>> > token on node A and restart it with -b? Then same for the rest?
>> > restarting
>> > only one node at a time?
>> >
>> >
>> >
>> > On Thu, Oct 1, 2009 at 1:22 PM, Jonathan Ellis <jb...@gmail.com>
>> > wrote:
>> >>
>> >> tokenupdater does not move data around; it's just an alternative to
>> >> setting <initialtoken> on each node.  so you really want to get your
>> >> tokens right for your initial set of nodes before adding data.
>> >>
>> >> we're finishing up full load balancing for 0.5 but even then it's best
>> >> to start with a reasonable distribution instead of starting with
>> >> random and forcing the balancer to move things around a bunch.
>> >>
>> >> On Thu, Oct 1, 2009 at 12:14 PM, Igor Katkov <ik...@gmail.com> wrote:
>> >> > What is the correct procedure for data re-partitioning?
>> >> > Suppose I have 3 nodes - "A", "B", "C"
>> >> > tokens on the ring:
>> >> > A: 0
>> >> > B: 2.8356863910078205288614550619314e+37
>> >> > C: 5.6713727820156410577229101238628e+37
>> >> >
>> >> > Then I add node "D", token: 1.4178431955039102644307275309655e+37
>> >> > (B/2)
>> >> > Start node "D" with -b
>> >> > Wait
>> >> > Run nodeprobe -host hostB ... cleanup on live "B"
>> >> > Wait
>> >> > Done
>> >> >
>> >> > Now data is not evenly balanced because tokens are not evenly spaced.
>> >> > I
>> >> > see
>> >> > that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
>> >> > What happens with keys and data if I run it on "A", "B", "C" and "D"
>> >> > with
>> >> > new, better spaced tokens? Should I? is there a better procedure?
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <jb...@gmail.com>
>> >> > wrote:
>> >> >>
>> >> >> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <ik...@gmail.com>
>> >> >> wrote:
>> >> >> > Hi,
>> >> >> >
>> >> >> > Question#1:
>> >> >> > How to manually select tokens to force equal spacing of tokens
>> >> >> > around
>> >> >> > the
>> >> >> > hash space?
>> >> >>
>> >> >> (Answered by Jun.)
>> >> >>
>> >> >> > Question#2:
>> >> >> > Let's assume that #1 was resolved somehow and key distribution is
>> >> >> > more
>> >> >> > or
>> >> >> > less even.
>> >> >> > A new node "C" joins the cluster. It's token falls somewhere
>> >> >> > between
>> >> >> > two
>> >> >> > other tokens on the ring (from nodes "A" and "B"
>> >> >> > clockwise-ordered).
>> >> >> > From
>> >> >> > now on "C" is responsible for a portion of data that used to
>> >> >> > exclusively
>> >> >> > belong to "B".
>> >> >> > a. Cassandra v.0.4 will not automatically transfer this data to
>> >> >> > "C"
>> >> >> > will
>> >> >> > it?
>> >> >>
>> >> >> It will, if you start C with the -b ("bootstrap") flag.
>> >> >>
>> >> >> > b. Do all reads to these keys fail?
>> >> >>
>> >> >> No.
>> >> >>
>> >> >> > c. What happens with the data reference by these keys on "B"? It
>> >> >> > will
>> >> >> > never
>> >> >> > be accessed there, therefor it becomes garbage. Since there are to
>> >> >> > GC
>> >> >> > will
>> >> >> > it stick forever?
>> >> >>
>> >> >> nodeprobe cleanup after the bootstrap completes will instruct B to
>> >> >> throw out data that has been copied to C.
>> >> >>
>> >> >> > d. What happens to replicas of these keys?
>> >> >>
>> >> >> These are also handled by -b.
>> >> >>
>> >> >> -Jonathan
>> >> >
>> >> >
>> >
>> >
>
>

Re: distributing tokens equally along the key distribution space

Posted by Igor Katkov <ik...@gmail.com>.

I see, so to make cluster always balanced (data-wise) number of nodes should
be doubled each time.
I see some activity in JIAR regarding load-balancing for v.0.5
Does it target the same thing? transferring data from node to node and
appropriately modifying tokens?

On Thu, Oct 1, 2009 at 1:42 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> You basically have two options.  You can wipe your data, change the
> tokens, and reload things, or you can add new nodes with -b to
> rebalance things that way.
>
> On Thu, Oct 1, 2009 at 12:34 PM, Igor Katkov <ik...@gmail.com> wrote:
> > OK, so I don't need to use tokenupdater, what are the steps to rebalance
> > data around the circle?
> >
> > In my test example (see below), I have A, D, B and C (clockwise) where
> > A holds 1/3 of the data
> > D - 1/6
> > B - 1/6
> > C - 1/3
> > I'm willing to change tokens manually, it's all right.
> > How do I tell all nodes to move data around in version 0.4? Do I change
> > token on node A and restart it with -b? Then same for the rest?
> restarting
> > only one node at a time?
> >
> >
> >
> > On Thu, Oct 1, 2009 at 1:22 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> tokenupdater does not move data around; it's just an alternative to
> >> setting <initialtoken> on each node.  so you really want to get your
> >> tokens right for your initial set of nodes before adding data.
> >>
> >> we're finishing up full load balancing for 0.5 but even then it's best
> >> to start with a reasonable distribution instead of starting with
> >> random and forcing the balancer to move things around a bunch.
> >>
> >> On Thu, Oct 1, 2009 at 12:14 PM, Igor Katkov <ik...@gmail.com> wrote:
> >> > What is the correct procedure for data re-partitioning?
> >> > Suppose I have 3 nodes - "A", "B", "C"
> >> > tokens on the ring:
> >> > A: 0
> >> > B: 2.8356863910078205288614550619314e+37
> >> > C: 5.6713727820156410577229101238628e+37
> >> >
> >> > Then I add node "D", token: 1.4178431955039102644307275309655e+37
> (B/2)
> >> > Start node "D" with -b
> >> > Wait
> >> > Run nodeprobe -host hostB ... cleanup on live "B"
> >> > Wait
> >> > Done
> >> >
> >> > Now data is not evenly balanced because tokens are not evenly spaced.
> I
> >> > see
> >> > that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
> >> > What happens with keys and data if I run it on "A", "B", "C" and "D"
> >> > with
> >> > new, better spaced tokens? Should I? is there a better procedure?
> >> >
> >> >
> >> >
> >> >
> >> > On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <jb...@gmail.com>
> >> > wrote:
> >> >>
> >> >> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <ik...@gmail.com>
> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > Question#1:
> >> >> > How to manually select tokens to force equal spacing of tokens
> around
> >> >> > the
> >> >> > hash space?
> >> >>
> >> >> (Answered by Jun.)
> >> >>
> >> >> > Question#2:
> >> >> > Let's assume that #1 was resolved somehow and key distribution is
> >> >> > more
> >> >> > or
> >> >> > less even.
> >> >> > A new node "C" joins the cluster. It's token falls somewhere
> between
> >> >> > two
> >> >> > other tokens on the ring (from nodes "A" and "B"
> clockwise-ordered).
> >> >> > From
> >> >> > now on "C" is responsible for a portion of data that used to
> >> >> > exclusively
> >> >> > belong to "B".
> >> >> > a. Cassandra v.0.4 will not automatically transfer this data to "C"
> >> >> > will
> >> >> > it?
> >> >>
> >> >> It will, if you start C with the -b ("bootstrap") flag.
> >> >>
> >> >> > b. Do all reads to these keys fail?
> >> >>
> >> >> No.
> >> >>
> >> >> > c. What happens with the data reference by these keys on "B"? It
> will
> >> >> > never
> >> >> > be accessed there, therefor it becomes garbage. Since there are to
> GC
> >> >> > will
> >> >> > it stick forever?
> >> >>
> >> >> nodeprobe cleanup after the bootstrap completes will instruct B to
> >> >> throw out data that has been copied to C.
> >> >>
> >> >> > d. What happens to replicas of these keys?
> >> >>
> >> >> These are also handled by -b.
> >> >>
> >> >> -Jonathan
> >> >
> >> >
> >
> >
>

Re: distributing tokens equally along the key distribution space

Posted by Jonathan Ellis <jb...@gmail.com>.

You basically have two options.  You can wipe your data, change the
tokens, and reload things, or you can add new nodes with -b to
rebalance things that way.

On Thu, Oct 1, 2009 at 12:34 PM, Igor Katkov <ik...@gmail.com> wrote:
> OK, so I don't need to use tokenupdater, what are the steps to rebalance
> data around the circle?
>
> In my test example (see below), I have A, D, B and C (clockwise) where
> A holds 1/3 of the data
> D - 1/6
> B - 1/6
> C - 1/3
> I'm willing to change tokens manually, it's all right.
> How do I tell all nodes to move data around in version 0.4? Do I change
> token on node A and restart it with -b? Then same for the rest? restarting
> only one node at a time?
>
>
>
> On Thu, Oct 1, 2009 at 1:22 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> tokenupdater does not move data around; it's just an alternative to
>> setting <initialtoken> on each node.  so you really want to get your
>> tokens right for your initial set of nodes before adding data.
>>
>> we're finishing up full load balancing for 0.5 but even then it's best
>> to start with a reasonable distribution instead of starting with
>> random and forcing the balancer to move things around a bunch.
>>
>> On Thu, Oct 1, 2009 at 12:14 PM, Igor Katkov <ik...@gmail.com> wrote:
>> > What is the correct procedure for data re-partitioning?
>> > Suppose I have 3 nodes - "A", "B", "C"
>> > tokens on the ring:
>> > A: 0
>> > B: 2.8356863910078205288614550619314e+37
>> > C: 5.6713727820156410577229101238628e+37
>> >
>> > Then I add node "D", token: 1.4178431955039102644307275309655e+37 (B/2)
>> > Start node "D" with -b
>> > Wait
>> > Run nodeprobe -host hostB ... cleanup on live "B"
>> > Wait
>> > Done
>> >
>> > Now data is not evenly balanced because tokens are not evenly spaced. I
>> > see
>> > that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
>> > What happens with keys and data if I run it on "A", "B", "C" and "D"
>> > with
>> > new, better spaced tokens? Should I? is there a better procedure?
>> >
>> >
>> >
>> >
>> > On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <jb...@gmail.com>
>> > wrote:
>> >>
>> >> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <ik...@gmail.com> wrote:
>> >> > Hi,
>> >> >
>> >> > Question#1:
>> >> > How to manually select tokens to force equal spacing of tokens around
>> >> > the
>> >> > hash space?
>> >>
>> >> (Answered by Jun.)
>> >>
>> >> > Question#2:
>> >> > Let's assume that #1 was resolved somehow and key distribution is
>> >> > more
>> >> > or
>> >> > less even.
>> >> > A new node "C" joins the cluster. It's token falls somewhere between
>> >> > two
>> >> > other tokens on the ring (from nodes "A" and "B" clockwise-ordered).
>> >> > From
>> >> > now on "C" is responsible for a portion of data that used to
>> >> > exclusively
>> >> > belong to "B".
>> >> > a. Cassandra v.0.4 will not automatically transfer this data to "C"
>> >> > will
>> >> > it?
>> >>
>> >> It will, if you start C with the -b ("bootstrap") flag.
>> >>
>> >> > b. Do all reads to these keys fail?
>> >>
>> >> No.
>> >>
>> >> > c. What happens with the data reference by these keys on "B"? It will
>> >> > never
>> >> > be accessed there, therefor it becomes garbage. Since there are to GC
>> >> > will
>> >> > it stick forever?
>> >>
>> >> nodeprobe cleanup after the bootstrap completes will instruct B to
>> >> throw out data that has been copied to C.
>> >>
>> >> > d. What happens to replicas of these keys?
>> >>
>> >> These are also handled by -b.
>> >>
>> >> -Jonathan
>> >
>> >
>
>

Re: distributing tokens equally along the key distribution space

Posted by Igor Katkov <ik...@gmail.com>.

OK, so I don't need to use tokenupdater, what are the steps to rebalance
data around the circle?

In my test example (see below), I have A, D, B and C (clockwise) where
A holds 1/3 of the data
D - 1/6
B - 1/6
C - 1/3
I'm willing to change tokens manually, it's all right.
How do I tell all nodes to move data around in version 0.4? Do I change
token on node A and restart it with -b? Then same for the rest? restarting
only one node at a time?



On Thu, Oct 1, 2009 at 1:22 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> tokenupdater does not move data around; it's just an alternative to
> setting <initialtoken> on each node.  so you really want to get your
> tokens right for your initial set of nodes before adding data.
>
> we're finishing up full load balancing for 0.5 but even then it's best
> to start with a reasonable distribution instead of starting with
> random and forcing the balancer to move things around a bunch.
>
> On Thu, Oct 1, 2009 at 12:14 PM, Igor Katkov <ik...@gmail.com> wrote:
> > What is the correct procedure for data re-partitioning?
> > Suppose I have 3 nodes - "A", "B", "C"
> > tokens on the ring:
> > A: 0
> > B: 2.8356863910078205288614550619314e+37
> > C: 5.6713727820156410577229101238628e+37
> >
> > Then I add node "D", token: 1.4178431955039102644307275309655e+37 (B/2)
> > Start node "D" with -b
> > Wait
> > Run nodeprobe -host hostB ... cleanup on live "B"
> > Wait
> > Done
> >
> > Now data is not evenly balanced because tokens are not evenly spaced. I
> see
> > that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
> > What happens with keys and data if I run it on "A", "B", "C" and "D" with
> > new, better spaced tokens? Should I? is there a better procedure?
> >
> >
> >
> >
> > On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >>
> >> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <ik...@gmail.com> wrote:
> >> > Hi,
> >> >
> >> > Question#1:
> >> > How to manually select tokens to force equal spacing of tokens around
> >> > the
> >> > hash space?
> >>
> >> (Answered by Jun.)
> >>
> >> > Question#2:
> >> > Let's assume that #1 was resolved somehow and key distribution is more
> >> > or
> >> > less even.
> >> > A new node "C" joins the cluster. It's token falls somewhere between
> two
> >> > other tokens on the ring (from nodes "A" and "B" clockwise-ordered).
> >> > From
> >> > now on "C" is responsible for a portion of data that used to
> exclusively
> >> > belong to "B".
> >> > a. Cassandra v.0.4 will not automatically transfer this data to "C"
> will
> >> > it?
> >>
> >> It will, if you start C with the -b ("bootstrap") flag.
> >>
> >> > b. Do all reads to these keys fail?
> >>
> >> No.
> >>
> >> > c. What happens with the data reference by these keys on "B"? It will
> >> > never
> >> > be accessed there, therefor it becomes garbage. Since there are to GC
> >> > will
> >> > it stick forever?
> >>
> >> nodeprobe cleanup after the bootstrap completes will instruct B to
> >> throw out data that has been copied to C.
> >>
> >> > d. What happens to replicas of these keys?
> >>
> >> These are also handled by -b.
> >>
> >> -Jonathan
> >
> >
>

Re: distributing tokens equally along the key distribution space

Posted by Jonathan Ellis <jb...@gmail.com>.

tokenupdater does not move data around; it's just an alternative to
setting <initialtoken> on each node.  so you really want to get your
tokens right for your initial set of nodes before adding data.

we're finishing up full load balancing for 0.5 but even then it's best
to start with a reasonable distribution instead of starting with
random and forcing the balancer to move things around a bunch.

On Thu, Oct 1, 2009 at 12:14 PM, Igor Katkov <ik...@gmail.com> wrote:
> What is the correct procedure for data re-partitioning?
> Suppose I have 3 nodes - "A", "B", "C"
> tokens on the ring:
> A: 0
> B: 2.8356863910078205288614550619314e+37
> C: 5.6713727820156410577229101238628e+37
>
> Then I add node "D", token: 1.4178431955039102644307275309655e+37 (B/2)
> Start node "D" with -b
> Wait
> Run nodeprobe -host hostB ... cleanup on live "B"
> Wait
> Done
>
> Now data is not evenly balanced because tokens are not evenly spaced. I see
> that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
> What happens with keys and data if I run it on "A", "B", "C" and "D" with
> new, better spaced tokens? Should I? is there a better procedure?
>
>
>
>
> On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <ik...@gmail.com> wrote:
>> > Hi,
>> >
>> > Question#1:
>> > How to manually select tokens to force equal spacing of tokens around
>> > the
>> > hash space?
>>
>> (Answered by Jun.)
>>
>> > Question#2:
>> > Let's assume that #1 was resolved somehow and key distribution is more
>> > or
>> > less even.
>> > A new node "C" joins the cluster. It's token falls somewhere between two
>> > other tokens on the ring (from nodes "A" and "B" clockwise-ordered).
>> > From
>> > now on "C" is responsible for a portion of data that used to exclusively
>> > belong to "B".
>> > a. Cassandra v.0.4 will not automatically transfer this data to "C" will
>> > it?
>>
>> It will, if you start C with the -b ("bootstrap") flag.
>>
>> > b. Do all reads to these keys fail?
>>
>> No.
>>
>> > c. What happens with the data reference by these keys on "B"? It will
>> > never
>> > be accessed there, therefor it becomes garbage. Since there are to GC
>> > will
>> > it stick forever?
>>
>> nodeprobe cleanup after the bootstrap completes will instruct B to
>> throw out data that has been copied to C.
>>
>> > d. What happens to replicas of these keys?
>>
>> These are also handled by -b.
>>
>> -Jonathan
>
>

Re: distributing tokens equally along the key distribution space

Posted by Igor Katkov <ik...@gmail.com>.

What is the correct procedure for data re-partitioning?
Suppose I have 3 nodes - "A", "B", "C"
tokens on the ring:
A: 0
B: 2.8356863910078205288614550619314e+37
C: 5.6713727820156410577229101238628e+37

Then I add node "D", token: 1.4178431955039102644307275309655e+37 (B/2)
Start node "D" with -b
Wait
Run nodeprobe -host hostB ... cleanup on live "B"
Wait
Done

Now data is not evenly balanced because tokens are not evenly spaced. I see
that there is tokenupdater (org.apache.cassandra.tools.TokenUpdater)
What happens with keys and data if I run it on "A", "B", "C" and "D" with
new, better spaced tokens? Should I? is there a better procedure?




On Thu, Oct 1, 2009 at 12:48 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <ik...@gmail.com> wrote:
> > Hi,
> >
> > Question#1:
> > How to manually select tokens to force equal spacing of tokens around the
> > hash space?
>
> (Answered by Jun.)
>
> > Question#2:
> > Let's assume that #1 was resolved somehow and key distribution is more or
> > less even.
> > A new node "C" joins the cluster. It's token falls somewhere between two
> > other tokens on the ring (from nodes "A" and "B" clockwise-ordered). From
> > now on "C" is responsible for a portion of data that used to exclusively
> > belong to "B".
> > a. Cassandra v.0.4 will not automatically transfer this data to "C" will
> it?
>
> It will, if you start C with the -b ("bootstrap") flag.
>
> > b. Do all reads to these keys fail?
>
> No.
>
> > c. What happens with the data reference by these keys on "B"? It will
> never
> > be accessed there, therefor it becomes garbage. Since there are to GC
> will
> > it stick forever?
>
> nodeprobe cleanup after the bootstrap completes will instruct B to
> throw out data that has been copied to C.
>
> > d. What happens to replicas of these keys?
>
> These are also handled by -b.
>
> -Jonathan
>

Re: distributing tokens equally along the key distribution space

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Oct 1, 2009 at 11:26 AM, Igor Katkov <ik...@gmail.com> wrote:
> Hi,
>
> Question#1:
> How to manually select tokens to force equal spacing of tokens around the
> hash space?

(Answered by Jun.)

> Question#2:
> Let's assume that #1 was resolved somehow and key distribution is more or
> less even.
> A new node "C" joins the cluster. It's token falls somewhere between two
> other tokens on the ring (from nodes "A" and "B" clockwise-ordered). From
> now on "C" is responsible for a portion of data that used to exclusively
> belong to "B".
> a. Cassandra v.0.4 will not automatically transfer this data to "C" will it?

It will, if you start C with the -b ("bootstrap") flag.

> b. Do all reads to these keys fail?

No.

> c. What happens with the data reference by these keys on "B"? It will never
> be accessed there, therefor it becomes garbage. Since there are to GC will
> it stick forever?

nodeprobe cleanup after the bootstrap completes will instruct B to
throw out data that has been copied to C.

> d. What happens to replicas of these keys?

These are also handled by -b.

-Jonathan

Re: distributing tokens equally along the key distribution space

Posted by Jun Rao <ju...@almaden.ibm.com>.

For #1, the random tokens are 128bits positive bigints. So, you can
generate tokens evenly btw [0, 2^127-1] and set them on each node using
InitialToken in storage-conf.xml.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

junrao@almaden.ibm.com


Igor Katkov <ik...@gmail.com> wrote on 10/01/2009 09:26:45 AM:

> [image removed]
>
> distributing tokens equally along the key distribution space
>
> Igor Katkov
>
> to:
>
> cassandra-user
>
> 10/01/2009 09:28 AM
>
> Please respond to cassandra-user
>
> Hi,
>
> Question#1:
> How to manually select tokens to force equal spacing of tokens
> around the hash space?
> If RandomPartitioner is used a token is a BigInteger, so there are
> no [0, Max value] interval to select token values from.
> If everything is left to defaults, a token is a random number (hash
> of GUID) so these 10 generated tokens will not be evenly spaced on the
ring.
> Suppose I have X nodes, what would be correct token values?
>
> Question#2:
> Let's assume that #1 was resolved somehow and key distribution is
> more or less even.
> A new node "C" joins the cluster. It's token falls somewhere between
> two other tokens on the ring (from nodes "A" and "B" clockwise-
> ordered). From now on "C" is responsible for a portion of data that
> used to exclusively belong to "B".
> a. Cassandra v.0.4 will not automatically transfer this data to "C" will
it?
> b. Do all reads to these keys fail?
> c. What happens with the data reference by these keys on "B"? It
> will never be accessed there, therefor it becomes garbage. Since
> there are to GC will it stick forever?
> d. What happens to replicas of these keys?