You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Eunsu Kim <eu...@gmail.com> on 2019/10/22 01:53:42 UTC

Curiosity in adding nodes

Hi experts,

When a new node was added, how can the coordinator find data that has been not yet streamed?

Or is new nodes not used until all data is streamed?

Thanks in advance
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Curiosity in adding nodes

Posted by guo Maxwell <cc...@gmail.com>.

1.the node added to the ring will calculate the token range it owns, then
get the data of the range from the nodes originally owned the data.
2.then the streamed sstable and the range of the sstable should be
estimated.
3.then streaming begins .secondary index will be build afther sstabte
streamed successfully.
4.when all data is transferred ,the node's status will change from joining
to normal .and the node status will be insert to system keyspace.
5.during the time the data is streaming . added node can be write data but
no select.

Eunsu Kim <eu...@gmail.com> 于2019年10月22日周二 上午9:54写道：

> Hi experts,
>
> When a new node was added, how can the coordinator find data that has been
> not yet streamed?
>
> Or is new nodes not used until all data is streamed?
>
> Thanks in advance
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

-- 
you are the apple of my eye !

Re: Curiosity in adding nodes

Posted by Craig Pastro <si...@gmail.com>.

If I understand correctly this is controlled by setting `auto_bootstrap`.
If it is set to true (the default), once the node joins the cluster it will
have some portion of the data assigned to it, and its data will be streamed
to it from the other nodes. Once the data has finished streaming only then
will this node start to answer queries. So to answer your question,

> Or is new nodes not used until all data is streamed?

Yes, by default.

You probably do not want to set `auto_bootstrap` to false. In fact, it is
"hidden" in `cassandra.yaml` (
https://issues.apache.org/jira/browse/CASSANDRA-2447). To see why you do
not want to set it to false there are a couple of nice articles:
https://monzo.com/blog/2019/09/08/why-monzo-wasnt-working-on-july-29th
https://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html

On Tue, Oct 22, 2019 at 10:54 AM Eunsu Kim <eu...@gmail.com> wrote:

> Hi experts,
>
> When a new node was added, how can the coordinator find data that has been
> not yet streamed?
>
> Or is new nodes not used until all data is streamed?
>
> Thanks in advance
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>