You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Tom van den Berge <to...@drillster.com> on 2015/09/07 22:11:02 UTC

Re: Is it possible to bootstrap the 1st node of a new DC?

Running nodetool rebuild on a node that was started with join_ring=false
does not work, unfortunately. The nodetool command returns immediately,
after a message appears in the log that the streaming of data has started.
After that, nothing happens.

Tom

On Fri, Sep 12, 2014 at 5:47 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Fri, Sep 12, 2014 at 6:57 AM, Tom van den Berge <to...@drillster.com>
> wrote:
>
>> Wouldn't it be far more efficient if a node that is rebuilding itself is
>> responsible for not accepting reads until the rebuild is complete? E.g. by
>> marking it as "Joining", similar to a node that is being bootstrapped?
>>
>
> Yes, and Cassandra 2.0.7 and above contain this long desired functionality.
>
> https://issues.apache.org/jira/browse/CASSANDRA-6961
>
> I presume that one can also run a rebuild in this state, though I haven't
> tried. Driftx gives it an 80% chance... try it and see and let us know? :D
>
> =Rob
>
>

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by Tom van den Berge <to...@gmail.com>.

> Running nodetool rebuild on a node that was started with join_ring=false
>> does not work, unfortunately. The nodetool command returns immediately,
>> after a message appears in the log that the streaming of data has started.
>> After that, nothing happens.
>
>
> Per driftx, the author of CASSANDRA-6961, this sounds like a bug. If you
> can repro, please file a JIRA and let the list know the URL.
>

I just filed https://issues.apache.org/jira/browse/CASSANDRA-10287.

(I wan't convinced that join_ring is supposed to work in conjunction with
nodetool rebuild, since CASSANDRA-6961 only speaks of repair.)

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by horschi <ho...@gmail.com>.

Hi Samuel,

thanks a lot for the jira link. Another reason to upgrade to 2.1 :-)

regards,
Christian



On Thu, Sep 10, 2015 at 1:28 PM, Samuel CARRIERE <sa...@urssaf.fr>
wrote:

> Hi Christian,
> The problem you mention (violation of constency) is a true one. If I have
> understood correctly, it is resolved in cassandra 2.1 (see CASSANDRA-2434).
> Regards,
> Samuel
>
>
> horschi <ho...@gmail.com> a écrit sur 10/09/2015 12:41:41 :
>
> > De : horschi <ho...@gmail.com>
> > A : user@cassandra.apache.org,
> > Date : 10/09/2015 12:42
> > Objet : Re: Is it possible to bootstrap the 1st node of a new DC?
> >
> > Hi Rob,
> >
> > regarding 1-3:
> > Thank you for the step-by-step explanation :-) My mistake was to use
> > join_ring=false during the inital start already. It now works for me
> > as its supposed to. Nevertheless it does not what I want, as it does
> > not take writes during the time of repair/rebuild: Running an 8 hour
> > repair will lead to 8 hours of data missing.
> >
> > regarding 1-6:
> > This is what we did. And it works of course. Our issue was just that
> > we had some global-QUORUMS hidden somewhere, which the operator was
> > not aware of. Therefore it would have been nice if the ops guy could
> > prevent these reads by himself.
> >
> >
> > Another issue I think the current bootstrapping process has: Doesn't
> > it practically reduce the RF for old data by one? (With old data I
> > mean any data that was written before the bootstrap).
> >
> > Let me give an example:
> >
> > Lets assume I have a cluster of Node 1,2 and 3 with RF=3. And lets
> > assume a single write on node 2 got lost. So this particular write
> > is only available on node 1 and 3.
> >
> > Now I add node 4, which takes the range in such a way that node 1
> > will not own that previously written key any more. Also assume that
> > the new node loads its data from node 2.
> >
> > This means we have a cluster where the previously mentioned write is
> > only on node 3. (Node 1 is not responsible for the key any more and
> > node 4 loaded its data from the wrong node)
> >
> > Any quorum-read that hit node 2 & 4 will not return the column. So
> > this means we effectively lowered the CL/RF.
> >
> > Therefore what I would like to be able to do is:
> > - Add new node 4, but leave it in a joining state. (This means it
> > gets all the writes but does not serve reads.)
> > - Do "nodetool rebuild"
> > - New node should not serve reads yet. And node 1 should not yet
> > give up its ranges to node 4.
> > - Do "nodetool repair", to ensure consistency.
> > - Finish bootstrap. Now node1 should not be responsible for the
> > range and node4 should become eligible for reads.
> >
> > regards,
> > Christian
> >
> > On Tue, Sep 8, 2015 at 11:51 PM, Robert Coli <rc...@eventbrite.com>
> wrote:
> > On Tue, Sep 8, 2015 at 2:39 PM, horschi <ho...@gmail.com> wrote:
> > I tried to set up a new node with join_ring=false once. In my test
> > that node did not pick a token in the ring. I assume running repair
> > or rebuild would not do anything in that case: No tokens = no data.
> > But I must admit: I have not tried running rebuild.
> >
> > I admit I haven't been following this thread closely, perhaps I have
> > missed what exactly it is you're trying to do.
> >
> > It's possible you'd need to :
> >
> > 1) join the node with auto_bootstrap=false
> > 2) immediately stop it
> > 3) re-start it with join_ring=false
> >
> > To actually use repair or rebuild in this way.
> >
> > However, if your goal is to create a new data-center and rebuild a
> > node there without any risk of reading from that node while creating
> > the new data center, you can just :
> >
> > 1) create nodes in new data-center, with RF=0 for that DC
> > 2) change RF in that DC
> > 3) run rebuild on new data-center nodes
> > 4) while doing so, don't talk to new data-center coordinators from your
> client
> > 5) and also use LOCAL_ONE/LOCAL_QUORUM to avoid cross-data-center
> > reads from your client
> > 6) modulo the handful of current bugs which make 5) currently imperfect
> >
> > What problem are you encountering with this procedure? If it's this ...
> >
> > I've learned from experience that the node immediately joins the
> > cluster, and starts accepting reads (from other DCs) for the range it
> owns.
> >
> > This seems to be the incorrect assumption at the heart of the
> > confusion. You "should" be able to prevent this behavior entirely
> > via correct use of ConsistencyLevel and client configuration.
> >
> > In an ideal world, I'd write a detailed blog post explaining this...
> > :/ in my copious spare time...
> >
> > =Rob
> >
>

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by Samuel CARRIERE <sa...@urssaf.fr>.

Hi Christian,
The problem you mention (violation of constency) is a true one. If I have 
understood correctly, it is resolved in cassandra 2.1 (see 
CASSANDRA-2434).
Regards,
Samuel


horschi <ho...@gmail.com> a écrit sur 10/09/2015 12:41:41 :

> De : horschi <ho...@gmail.com>
> A : user@cassandra.apache.org, 
> Date : 10/09/2015 12:42
> Objet : Re: Is it possible to bootstrap the 1st node of a new DC?
> 
> Hi Rob,
> 
> regarding 1-3:
> Thank you for the step-by-step explanation :-) My mistake was to use
> join_ring=false during the inital start already. It now works for me
> as its supposed to. Nevertheless it does not what I want, as it does
> not take writes during the time of repair/rebuild: Running an 8 hour
> repair will lead to 8 hours of data missing.
> 
> regarding 1-6:
> This is what we did. And it works of course. Our issue was just that
> we had some global-QUORUMS hidden somewhere, which the operator was 
> not aware of. Therefore it would have been nice if the ops guy could
> prevent these reads by himself.
> 
> 
> Another issue I think the current bootstrapping process has: Doesn't
> it practically reduce the RF for old data by one? (With old data I 
> mean any data that was written before the bootstrap).
> 
> Let me give an example:
> 
> Lets assume I have a cluster of Node 1,2 and 3 with RF=3. And lets 
> assume a single write on node 2 got lost. So this particular write 
> is only available on node 1 and 3.
> 
> Now I add node 4, which takes the range in such a way that node 1 
> will not own that previously written key any more. Also assume that 
> the new node loads its data from node 2.
> 
> This means we have a cluster where the previously mentioned write is
> only on node 3. (Node 1 is not responsible for the key any more and 
> node 4 loaded its data from the wrong node)
> 
> Any quorum-read that hit node 2 & 4 will not return the column. So 
> this means we effectively lowered the CL/RF.
> 
> Therefore what I would like to be able to do is:
> - Add new node 4, but leave it in a joining state. (This means it 
> gets all the writes but does not serve reads.)
> - Do "nodetool rebuild"
> - New node should not serve reads yet. And node 1 should not yet 
> give up its ranges to node 4.
> - Do "nodetool repair", to ensure consistency.
> - Finish bootstrap. Now node1 should not be responsible for the 
> range and node4 should become eligible for reads.
> 
> regards,
> Christian
> 
> On Tue, Sep 8, 2015 at 11:51 PM, Robert Coli <rc...@eventbrite.com> 
wrote:
> On Tue, Sep 8, 2015 at 2:39 PM, horschi <ho...@gmail.com> wrote:
> I tried to set up a new node with join_ring=false once. In my test 
> that node did not pick a token in the ring. I assume running repair 
> or rebuild would not do anything in that case: No tokens = no data. 
> But I must admit: I have not tried running rebuild.
> 
> I admit I haven't been following this thread closely, perhaps I have
> missed what exactly it is you're trying to do.
> 
> It's possible you'd need to :
> 
> 1) join the node with auto_bootstrap=false
> 2) immediately stop it
> 3) re-start it with join_ring=false
> 
> To actually use repair or rebuild in this way.
> 
> However, if your goal is to create a new data-center and rebuild a 
> node there without any risk of reading from that node while creating
> the new data center, you can just :
> 
> 1) create nodes in new data-center, with RF=0 for that DC
> 2) change RF in that DC
> 3) run rebuild on new data-center nodes
> 4) while doing so, don't talk to new data-center coordinators from your 
client
> 5) and also use LOCAL_ONE/LOCAL_QUORUM to avoid cross-data-center 
> reads from your client
> 6) modulo the handful of current bugs which make 5) currently imperfect
> 
> What problem are you encountering with this procedure? If it's this ...
> 
> I've learned from experience that the node immediately joins the 
> cluster, and starts accepting reads (from other DCs) for the range it 
owns.
> 
> This seems to be the incorrect assumption at the heart of the 
> confusion. You "should" be able to prevent this behavior entirely 
> via correct use of ConsistencyLevel and client configuration.
> 
> In an ideal world, I'd write a detailed blog post explaining this...
> :/ in my copious spare time...
> 
> =Rob
>

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by horschi <ho...@gmail.com>.

Hi Rob,

regarding 1-3:
Thank you for the step-by-step explanation :-) My mistake was to use
join_ring=false during the inital start already. It now works for me as its
supposed to. Nevertheless it does not what I want, as it does not take
writes during the time of repair/rebuild: Running an 8 hour repair will
lead to 8 hours of data missing.

regarding 1-6:
This is what we did. And it works of course. Our issue was just that we had
some global-QUORUMS hidden somewhere, which the operator was not aware of.
Therefore it would have been nice if the ops guy could prevent these reads
by himself.

Another issue I think the current bootstrapping process has: Doesn't it
practically reduce the RF for old data by one? (With old data I mean any
data that was written before the bootstrap).

Let me give an example:

Lets assume I have a cluster of Node 1,2 and 3 with RF=3. And lets assume a
single write on node 2 got lost. So this particular write is only available
on node 1 and 3.

Now I add node 4, which takes the range in such a way that node 1 will not
own that previously written key any more. Also assume that the new node
loads its data from node 2.

This means we have a cluster where the previously mentioned write is only
on node 3. (Node 1 is not responsible for the key any more and node 4
loaded its data from the wrong node)

Any quorum-read that hit node 2 & 4 will not return the column. So this
means we effectively lowered the CL/RF.

Therefore what I would like to be able to do is:
- Add new node 4, but leave it in a joining state. (This means it gets all
the writes but does not serve reads.)
- Do "nodetool rebuild"
- New node should not serve reads yet. And node 1 should not yet give up
its ranges to node 4.
- Do "nodetool repair", to ensure consistency.
- Finish bootstrap. Now node1 should not be responsible for the range and
node4 should become eligible for reads.

regards,
Christian

On Tue, Sep 8, 2015 at 11:51 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Tue, Sep 8, 2015 at 2:39 PM, horschi <ho...@gmail.com> wrote:
>
>> I tried to set up a new node with join_ring=false once. In my test that
>> node did not pick a token in the ring. I assume running repair or rebuild
>> would not do anything in that case: No tokens = no data. But I must admit:
>> I have not tried running rebuild.
>>
>
> I admit I haven't been following this thread closely, perhaps I have
> missed what exactly it is you're trying to do.
>
> It's possible you'd need to :
>
> 1) join the node with auto_bootstrap=false
> 2) immediately stop it
> 3) re-start it with join_ring=false
>
> To actually use repair or rebuild in this way.
>
> However, if your goal is to create a new data-center and rebuild a node
> there without any risk of reading from that node while creating the new
> data center, you can just :
>
> 1) create nodes in new data-center, with RF=0 for that DC
> 2) change RF in that DC
> 3) run rebuild on new data-center nodes
> 4) while doing so, don't talk to new data-center coordinators from your
> client
> 5) and also use LOCAL_ONE/LOCAL_QUORUM to avoid cross-data-center reads
> from your client
> 6) modulo the handful of current bugs which make 5) currently imperfect
>
> What problem are you encountering with this procedure? If it's this ...
>
> I've learned from experience that the node immediately joins the cluster,
>> and starts accepting reads (from other DCs) for the range it owns.
>
>
> This seems to be the incorrect assumption at the heart of the confusion.
> You "should" be able to prevent this behavior entirely via correct use of
> ConsistencyLevel and client configuration.
>
> In an ideal world, I'd write a detailed blog post explaining this... :/ in
> my copious spare time...
>
> =Rob
>
>
>

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Sep 9, 2015 at 1:05 AM, Tom van den Berge <tom.vandenberge@gmail.com
> wrote:

>
>> I've learned from experience that the node immediately joins the cluster,
>>> and starts accepting reads (from other DCs) for the range it owns.
>>
>>
>> This seems to be the incorrect assumption at the heart of the confusion.
>> You "should" be able to prevent this behavior entirely via correct use of
>> ConsistencyLevel and client configuration.
>>
>
> That is correct, but I just learned that  CASSANDRA-9753
> <https://issues.apache.org/jira/browse/CASSANDRA-9753> is (in my
> situation) causing problems by incorrectly sending reads to the new DC. A
> work around for this bug is to set speculative_retry to 'NONE' for all
> involved tables. This seems to solve the issue for me.
>

Ok, phew, "glad" it is "just" CASSANDRA-9753
<https://issues.apache.org/jira/browse/CASSANDRA-9753>. :D

=Rob

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by Tom van den Berge <to...@gmail.com>.

>
>
> I've learned from experience that the node immediately joins the cluster,
>> and starts accepting reads (from other DCs) for the range it owns.
>
>
> This seems to be the incorrect assumption at the heart of the confusion.
> You "should" be able to prevent this behavior entirely via correct use of
> ConsistencyLevel and client configuration.
>

That is correct, but I just learned that  CASSANDRA-9753
<https://issues.apache.org/jira/browse/CASSANDRA-9753> is (in my situation)
causing problems by incorrectly sending reads to the new DC. A work around
for this bug is to set speculative_retry to 'NONE' for all involved tables.
This seems to solve the issue for me.

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Sep 8, 2015 at 2:39 PM, horschi <ho...@gmail.com> wrote:

> I tried to set up a new node with join_ring=false once. In my test that
> node did not pick a token in the ring. I assume running repair or rebuild
> would not do anything in that case: No tokens = no data. But I must admit:
> I have not tried running rebuild.
>

I admit I haven't been following this thread closely, perhaps I have missed
what exactly it is you're trying to do.

It's possible you'd need to :

1) join the node with auto_bootstrap=false
2) immediately stop it
3) re-start it with join_ring=false

To actually use repair or rebuild in this way.

However, if your goal is to create a new data-center and rebuild a node
there without any risk of reading from that node while creating the new
data center, you can just :

1) create nodes in new data-center, with RF=0 for that DC
2) change RF in that DC
3) run rebuild on new data-center nodes
4) while doing so, don't talk to new data-center coordinators from your
client
5) and also use LOCAL_ONE/LOCAL_QUORUM to avoid cross-data-center reads
from your client
6) modulo the handful of current bugs which make 5) currently imperfect

What problem are you encountering with this procedure? If it's this ...

I've learned from experience that the node immediately joins the cluster,
> and starts accepting reads (from other DCs) for the range it owns.

This seems to be the incorrect assumption at the heart of the confusion.
You "should" be able to prevent this behavior entirely via correct use of
ConsistencyLevel and client configuration.

In an ideal world, I'd write a detailed blog post explaining this... :/ in
my copious spare time...

=Rob

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by horschi <ho...@gmail.com>.

Hi Robert,

I tried to set up a new node with join_ring=false once. In my test that
node did not pick a token in the ring. I assume running repair or rebuild
would not do anything in that case: No tokens = no data. But I must admit:
I have not tried running rebuild.

Is a new node with join_ring=false supposed to pick tokens? From driftx
comment in CASSANDRA-6961 I take it should not.

Tom: What does "nodetool status" say after you started the new node
with join_ring=false?
In my test I got a node that was not in the ring at all.

kind regards,
Christian



On Tue, Sep 8, 2015 at 9:05 PM, Robert Coli <rc...@eventbrite.com> wrote:

>
>
> On Tue, Sep 8, 2015 at 1:39 AM, horschi <ho...@gmail.com> wrote:
>
>> "The idea of join_ring=false is that other nodes are not aware of the
>> new node, and therefore never send requests to it. The new node can then be
>> repaired"
>> Nicely explained, but I still see the issue that this node would not
>> receive writes during that time. So after the repair the node would still
>> miss data.
>> Again, what is needed is either some joining-state or write-survey that
>> allows disabling reads, but still accepts writes.
>>
>
> https://issues.apache.org/jira/browse/CASSANDRA-6961
> "
> We can *almost* set join_ring to false, then repair, and then join the
> ring to narrow the window (actually, you can do this and everything
> succeeds because the node doesn't know it's a member yet, which is probably
> a bit of a bug.) If instead we modified this to put the node in hibernate,
> like replace_address does, it could work almost like replace, except you
> could run a repair (manually) while in the hibernate state, and then flip
> to normal when it's done.
> "
>
> Since 2.0.7, you should be able to use join_ring=false + repair to do the
> operation this thread discusses.
>
> Has anyone here tried and found it wanting? If so, in what way?
>
> For the record, I find various statements in this thread confusing and
> likely to be wrong :
>
> " And again, your node won't receive any writes while you are rebuilding.
>>  "
>
>
> If your RF has been increased in the new DC, sure you will, you'll get the
> writes you're supposed to get because of your RF? The challenge with
> rebuild is premature reads from the new DC, not losing writes?
>
> Running nodetool rebuild on a node that was started with join_ring=false
>> does not work, unfortunately. The nodetool command returns immediately,
>> after a message appears in the log that the streaming of data has started.
>> After that, nothing happens.
>
>
> Per driftx, the author of CASSANDRA-6961, this sounds like a bug. If you
> can repro, please file a JIRA and let the list know the URL.
>
> =Rob
>
>
>

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by Robert Coli <rc...@eventbrite.com>.

On Tue, Sep 8, 2015 at 1:39 AM, horschi <ho...@gmail.com> wrote:

> "The idea of join_ring=false is that other nodes are not aware of the new
> node, and therefore never send requests to it. The new node can then be
> repaired"
> Nicely explained, but I still see the issue that this node would not
> receive writes during that time. So after the repair the node would still
> miss data.
> Again, what is needed is either some joining-state or write-survey that
> allows disabling reads, but still accepts writes.
>

https://issues.apache.org/jira/browse/CASSANDRA-6961
"
We can *almost* set join_ring to false, then repair, and then join the ring
to narrow the window (actually, you can do this and everything succeeds
because the node doesn't know it's a member yet, which is probably a bit of
a bug.) If instead we modified this to put the node in hibernate, like
replace_address does, it could work almost like replace, except you could
run a repair (manually) while in the hibernate state, and then flip to
normal when it's done.
"

Since 2.0.7, you should be able to use join_ring=false + repair to do the
operation this thread discusses.

Has anyone here tried and found it wanting? If so, in what way?

For the record, I find various statements in this thread confusing and
likely to be wrong :

" And again, your node won't receive any writes while you are rebuilding. "

If your RF has been increased in the new DC, sure you will, you'll get the
writes you're supposed to get because of your RF? The challenge with
rebuild is premature reads from the new DC, not losing writes?

Running nodetool rebuild on a node that was started with join_ring=false
> does not work, unfortunately. The nodetool command returns immediately,
> after a message appears in the log that the streaming of data has started.
> After that, nothing happens.

Per driftx, the author of CASSANDRA-6961, this sounds like a bug. If you
can repro, please file a JIRA and let the list know the URL.

=Rob

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by horschi <ho...@gmail.com>.

Hi Tom,

"The idea of join_ring=false is that other nodes are not aware of the new
node, and therefore never send requests to it. The new node can then be
repaired"
Nicely explained, but I still see the issue that this node would not
receive writes during that time. So after the repair the node would still
miss data.
Again, what is needed is either some joining-state or write-survey that
allows disabling reads, but still accepts writes.



"To set up a new DC, I was hoping that you could also rebuild (instead of a
repair) a new node while join_ring=false, but that seems not to work."
Correct. The node does not get any tokens with join_ring=false. And again,
your node won't receive any writes while you are rebuilding. Therefore you
will have outdated data at the point when you are done rebuilding.


kind regards,
Christian





On Tue, Sep 8, 2015 at 10:00 AM, Tom van den Berge <
tom.vandenberge@gmail.com> wrote:

> "one drawback: the node joins the cluster as soon as the bootstrapping
>> begins."
>> I am not sure I understand this correctly. It will get tokens, but not
>> load data if you combine it with autobootstrap=false.
>>
> Joining the cluster means that all other nodes become aware of the new
> node, and therefore it might receive reads. And yes, it will not have any
> data, because auto_bootstrap=false.
>
>
>
>> How I see it: You should be able to start all the new nodes in the new DC
>> with autobootstrap=false and survey-mode=true. Then you should have a new
>> DC with nodes that have tokens but no data. Then you can start rebuild on
>> all new nodes. During this process, the new nodes should get writes, but
>> not serve reads.
>>
> Maybe you're right.
>
>
>>
>> "It turns out that join_ring=false in this scenario does not solve this
>> problem"
>> I also don't see how joing_ring would help here. (Actually I have no clue
>> where you would ever need that option)
>>
> The idea of join_ring=false is that other nodes are not aware of the new
> node, and therefore never send requests to it. The new node can then be
> repaired (see https://issues.apache.org/jira/browse/CASSANDRA-6961). To
> set up a new DC, I was hoping that you could also rebuild (instead of a
> repair) a new node while join_ring=false, but that seems not to work.
>
>>
>>
>> "Currently I'm trying to auto_bootstrap my new DC. The good thing is that
>> it doesn't accept reads from other DCs."
>> The joining-state actually works perfectly. The joining state is a state
>> where node take writes, but not serve ready. It would be really cool if you
>> could boot a node into the joining state. Actually, write_survey should
>> basically be the same.
>>
> It would be great if you could select the DC from where it's bootstrapped,
> similar to nodetool rebuild. I'm currently bootstrapping a node in
> San-Jose. It decides to stream all data from another DC in Amsterdam, while
> we also have another DC in San-Jose, right next to it. Streaming data
> across the Atlantic takes a lot more time :(
>
>
>
>>
>> kind regards,
>> Christian
>>
>> PS: I would love to see the results, if you perform any tests on the
>> write-survey. Please share it here on the mailing list :-)
>>
>>
>>
>> On Mon, Sep 7, 2015 at 11:10 PM, Tom van den Berge <
>> tom.vandenberge@gmail.com> wrote:
>>
>>> Hi Christian,
>>>
>>> No, I never tried survey mode. I didn't know it until now, but form the
>>> info I was able to find it looks like it is meant for a different purpose.
>>> Maybe it can be used to bootstrap a new DC, though.
>>>
>>> On the other hand, the auto_bootstrap=false + rebuild scenario seems to
>>> be designed to do exactly what I need, except that it has one drawback: the
>>> node joins the cluster as soon as the bootstrapping begins.
>>>
>>> It turns out that join_ring=false in this scenario does not solve this
>>> problem, since nodetool rebuild does not do anything if C* is started with
>>> this option.
>>>
>>> A workaround could be to ensure that only LOCAL_* CL is used by all
>>> clients, but even then I'm seeing failed queries, because they're
>>> mysteriously routed to the new DC every now and then.
>>>
>>> Currently I'm trying to auto_bootstrap my new DC. The good thing is that
>>> it doesn't accept reads from other DCs. The bad thing is that a) I can't
>>> choose where it streams its data from, and b) the two nodes I've been
>>> trying to bootstrap crashed when they were almost finished...
>>>
>>>
>>>
>>> On Mon, Sep 7, 2015 at 10:22 PM, horschi <ho...@gmail.com> wrote:
>>>
>>>> Hi Tom,
>>>>
>>>> this sounds very much like my thread: "auto_bootstrap=false broken?"
>>>>
>>>> Did you try booting the new node with survey-mode? I wanted to try
>>>> this, but I am waiting for 2.0.17 to come out (survey mode is broken in
>>>> earlier versions). Imho survey mode is what you (and me too) want: start a
>>>> node, accepting writes, but not serving reads. I have not tested it yet,
>>>> but I think it should work.
>>>>
>>>> Also the manual join mentioned in CASSANDRA-9667 sounds very
>>>> interesting.
>>>>
>>>> kind regards,
>>>> Christian
>>>>
>>>> On Mon, Sep 7, 2015 at 10:11 PM, Tom van den Berge <to...@drillster.com>
>>>> wrote:
>>>>
>>>>> Running nodetool rebuild on a node that was started with
>>>>> join_ring=false does not work, unfortunately. The nodetool command returns
>>>>> immediately, after a message appears in the log that the streaming of data
>>>>> has started. After that, nothing happens.
>>>>>
>>>>> Tom
>>>>>
>>>>>
>>>>> On Fri, Sep 12, 2014 at 5:47 PM, Robert Coli <rc...@eventbrite.com>
>>>>> wrote:
>>>>>
>>>>>> On Fri, Sep 12, 2014 at 6:57 AM, Tom van den Berge <tom@drillster.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Wouldn't it be far more efficient if a node that is rebuilding
>>>>>>> itself is responsible for not accepting reads until the rebuild is
>>>>>>> complete? E.g. by marking it as "Joining", similar to a node that is being
>>>>>>> bootstrapped?
>>>>>>>
>>>>>>
>>>>>> Yes, and Cassandra 2.0.7 and above contain this long desired
>>>>>> functionality.
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-6961
>>>>>>
>>>>>> I presume that one can also run a rebuild in this state, though I
>>>>>> haven't tried. Driftx gives it an 80% chance... try it and see and let us
>>>>>> know? :D
>>>>>>
>>>>>> =Rob
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by Tom van den Berge <to...@gmail.com>.

>
> "one drawback: the node joins the cluster as soon as the bootstrapping
> begins."
> I am not sure I understand this correctly. It will get tokens, but not
> load data if you combine it with autobootstrap=false.
>
Joining the cluster means that all other nodes become aware of the new
node, and therefore it might receive reads. And yes, it will not have any
data, because auto_bootstrap=false.



> How I see it: You should be able to start all the new nodes in the new DC
> with autobootstrap=false and survey-mode=true. Then you should have a new
> DC with nodes that have tokens but no data. Then you can start rebuild on
> all new nodes. During this process, the new nodes should get writes, but
> not serve reads.
>
Maybe you're right.


>
> "It turns out that join_ring=false in this scenario does not solve this
> problem"
> I also don't see how joing_ring would help here. (Actually I have no clue
> where you would ever need that option)
>
The idea of join_ring=false is that other nodes are not aware of the new
node, and therefore never send requests to it. The new node can then be
repaired (see https://issues.apache.org/jira/browse/CASSANDRA-6961). To set
up a new DC, I was hoping that you could also rebuild (instead of a repair)
a new node while join_ring=false, but that seems not to work.

>
>
> "Currently I'm trying to auto_bootstrap my new DC. The good thing is that
> it doesn't accept reads from other DCs."
> The joining-state actually works perfectly. The joining state is a state
> where node take writes, but not serve ready. It would be really cool if you
> could boot a node into the joining state. Actually, write_survey should
> basically be the same.
>
It would be great if you could select the DC from where it's bootstrapped,
similar to nodetool rebuild. I'm currently bootstrapping a node in
San-Jose. It decides to stream all data from another DC in Amsterdam, while
we also have another DC in San-Jose, right next to it. Streaming data
across the Atlantic takes a lot more time :(



>
> kind regards,
> Christian
>
> PS: I would love to see the results, if you perform any tests on the
> write-survey. Please share it here on the mailing list :-)
>
>
>
> On Mon, Sep 7, 2015 at 11:10 PM, Tom van den Berge <
> tom.vandenberge@gmail.com> wrote:
>
>> Hi Christian,
>>
>> No, I never tried survey mode. I didn't know it until now, but form the
>> info I was able to find it looks like it is meant for a different purpose.
>> Maybe it can be used to bootstrap a new DC, though.
>>
>> On the other hand, the auto_bootstrap=false + rebuild scenario seems to
>> be designed to do exactly what I need, except that it has one drawback: the
>> node joins the cluster as soon as the bootstrapping begins.
>>
>> It turns out that join_ring=false in this scenario does not solve this
>> problem, since nodetool rebuild does not do anything if C* is started with
>> this option.
>>
>> A workaround could be to ensure that only LOCAL_* CL is used by all
>> clients, but even then I'm seeing failed queries, because they're
>> mysteriously routed to the new DC every now and then.
>>
>> Currently I'm trying to auto_bootstrap my new DC. The good thing is that
>> it doesn't accept reads from other DCs. The bad thing is that a) I can't
>> choose where it streams its data from, and b) the two nodes I've been
>> trying to bootstrap crashed when they were almost finished...
>>
>>
>>
>> On Mon, Sep 7, 2015 at 10:22 PM, horschi <ho...@gmail.com> wrote:
>>
>>> Hi Tom,
>>>
>>> this sounds very much like my thread: "auto_bootstrap=false broken?"
>>>
>>> Did you try booting the new node with survey-mode? I wanted to try this,
>>> but I am waiting for 2.0.17 to come out (survey mode is broken in earlier
>>> versions). Imho survey mode is what you (and me too) want: start a node,
>>> accepting writes, but not serving reads. I have not tested it yet, but I
>>> think it should work.
>>>
>>> Also the manual join mentioned in CASSANDRA-9667 sounds very interesting.
>>>
>>> kind regards,
>>> Christian
>>>
>>> On Mon, Sep 7, 2015 at 10:11 PM, Tom van den Berge <to...@drillster.com>
>>> wrote:
>>>
>>>> Running nodetool rebuild on a node that was started with
>>>> join_ring=false does not work, unfortunately. The nodetool command returns
>>>> immediately, after a message appears in the log that the streaming of data
>>>> has started. After that, nothing happens.
>>>>
>>>> Tom
>>>>
>>>>
>>>> On Fri, Sep 12, 2014 at 5:47 PM, Robert Coli <rc...@eventbrite.com>
>>>> wrote:
>>>>
>>>>> On Fri, Sep 12, 2014 at 6:57 AM, Tom van den Berge <to...@drillster.com>
>>>>> wrote:
>>>>>
>>>>>> Wouldn't it be far more efficient if a node that is rebuilding itself
>>>>>> is responsible for not accepting reads until the rebuild is complete? E.g.
>>>>>> by marking it as "Joining", similar to a node that is being bootstrapped?
>>>>>>
>>>>>
>>>>> Yes, and Cassandra 2.0.7 and above contain this long desired
>>>>> functionality.
>>>>>
>>>>> https://issues.apache.org/jira/browse/CASSANDRA-6961
>>>>>
>>>>> I presume that one can also run a rebuild in this state, though I
>>>>> haven't tried. Driftx gives it an 80% chance... try it and see and let us
>>>>> know? :D
>>>>>
>>>>> =Rob
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by horschi <ho...@gmail.com>.

Hi Tom,

"one drawback: the node joins the cluster as soon as the bootstrapping
begins."
I am not sure I understand this correctly. It will get tokens, but not load
data if you combine it with autobootstrap=false.

How I see it: You should be able to start all the new nodes in the new DC
with autobootstrap=false and survey-mode=true. Then you should have a new
DC with nodes that have tokens but no data. Then you can start rebuild on
all new nodes. During this process, the new nodes should get writes, but
not serve reads.

Disclaimer: I have not tested the combination of the two!

"It turns out that join_ring=false in this scenario does not solve this
problem"
I also don't see how joing_ring would help here. (Actually I have no clue
where you would ever need that option)

"A workaround could be to ensure that only LOCAL_* CL is used by all
clients, but even then I'm seeing failed queries, because they're
mysteriously routed to the new DC every now and then."
Yes, it works fine if you don't do any mistakes. Keep in mind you also have
to make sure your driver does not connect against the other DC. But I agree
with you: its a workaround for this scenario. To me this does not feel
correct.

"Currently I'm trying to auto_bootstrap my new DC. The good thing is that
it doesn't accept reads from other DCs."
The joining-state actually works perfectly. The joining state is a state
where node take writes, but not serve ready. It would be really cool if you
could boot a node into the joining state. Actually, write_survey should
basically be the same.

kind regards,
Christian

PS: I would love to see the results, if you perform any tests on the
write-survey. Please share it here on the mailing list :-)

On Mon, Sep 7, 2015 at 11:10 PM, Tom van den Berge <
tom.vandenberge@gmail.com> wrote:

> Hi Christian,
>
> No, I never tried survey mode. I didn't know it until now, but form the
> info I was able to find it looks like it is meant for a different purpose.
> Maybe it can be used to bootstrap a new DC, though.
>
> On the other hand, the auto_bootstrap=false + rebuild scenario seems to be
> designed to do exactly what I need, except that it has one drawback: the
> node joins the cluster as soon as the bootstrapping begins.
>
> It turns out that join_ring=false in this scenario does not solve this
> problem, since nodetool rebuild does not do anything if C* is started with
> this option.
>
> A workaround could be to ensure that only LOCAL_* CL is used by all
> clients, but even then I'm seeing failed queries, because they're
> mysteriously routed to the new DC every now and then.
>
> Currently I'm trying to auto_bootstrap my new DC. The good thing is that
> it doesn't accept reads from other DCs. The bad thing is that a) I can't
> choose where it streams its data from, and b) the two nodes I've been
> trying to bootstrap crashed when they were almost finished...
>
>
>
> On Mon, Sep 7, 2015 at 10:22 PM, horschi <ho...@gmail.com> wrote:
>
>> Hi Tom,
>>
>> this sounds very much like my thread: "auto_bootstrap=false broken?"
>>
>> Did you try booting the new node with survey-mode? I wanted to try this,
>> but I am waiting for 2.0.17 to come out (survey mode is broken in earlier
>> versions). Imho survey mode is what you (and me too) want: start a node,
>> accepting writes, but not serving reads. I have not tested it yet, but I
>> think it should work.
>>
>> Also the manual join mentioned in CASSANDRA-9667 sounds very interesting.
>>
>> kind regards,
>> Christian
>>
>> On Mon, Sep 7, 2015 at 10:11 PM, Tom van den Berge <to...@drillster.com>
>> wrote:
>>
>>> Running nodetool rebuild on a node that was started with join_ring=false
>>> does not work, unfortunately. The nodetool command returns immediately,
>>> after a message appears in the log that the streaming of data has started.
>>> After that, nothing happens.
>>>
>>> Tom
>>>
>>>
>>> On Fri, Sep 12, 2014 at 5:47 PM, Robert Coli <rc...@eventbrite.com>
>>> wrote:
>>>
>>>> On Fri, Sep 12, 2014 at 6:57 AM, Tom van den Berge <to...@drillster.com>
>>>> wrote:
>>>>
>>>>> Wouldn't it be far more efficient if a node that is rebuilding itself
>>>>> is responsible for not accepting reads until the rebuild is complete? E.g.
>>>>> by marking it as "Joining", similar to a node that is being bootstrapped?
>>>>>
>>>>
>>>> Yes, and Cassandra 2.0.7 and above contain this long desired
>>>> functionality.
>>>>
>>>> https://issues.apache.org/jira/browse/CASSANDRA-6961
>>>>
>>>> I presume that one can also run a rebuild in this state, though I
>>>> haven't tried. Driftx gives it an 80% chance... try it and see and let us
>>>> know? :D
>>>>
>>>> =Rob
>>>>
>>>>
>>>
>>>
>>>
>>
>

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by Tom van den Berge <to...@gmail.com>.

Hi Christian,

No, I never tried survey mode. I didn't know it until now, but form the
info I was able to find it looks like it is meant for a different purpose.
Maybe it can be used to bootstrap a new DC, though.

On the other hand, the auto_bootstrap=false + rebuild scenario seems to be
designed to do exactly what I need, except that it has one drawback: the
node joins the cluster as soon as the bootstrapping begins.

It turns out that join_ring=false in this scenario does not solve this
problem, since nodetool rebuild does not do anything if C* is started with
this option.

A workaround could be to ensure that only LOCAL_* CL is used by all
clients, but even then I'm seeing failed queries, because they're
mysteriously routed to the new DC every now and then.

Currently I'm trying to auto_bootstrap my new DC. The good thing is that it
doesn't accept reads from other DCs. The bad thing is that a) I can't
choose where it streams its data from, and b) the two nodes I've been
trying to bootstrap crashed when they were almost finished...

On Mon, Sep 7, 2015 at 10:22 PM, horschi <ho...@gmail.com> wrote:

> Hi Tom,
>
> this sounds very much like my thread: "auto_bootstrap=false broken?"
>
> Did you try booting the new node with survey-mode? I wanted to try this,
> but I am waiting for 2.0.17 to come out (survey mode is broken in earlier
> versions). Imho survey mode is what you (and me too) want: start a node,
> accepting writes, but not serving reads. I have not tested it yet, but I
> think it should work.
>
> Also the manual join mentioned in CASSANDRA-9667 sounds very interesting.
>
> kind regards,
> Christian
>
> On Mon, Sep 7, 2015 at 10:11 PM, Tom van den Berge <to...@drillster.com>
> wrote:
>
>> Running nodetool rebuild on a node that was started with join_ring=false
>> does not work, unfortunately. The nodetool command returns immediately,
>> after a message appears in the log that the streaming of data has started.
>> After that, nothing happens.
>>
>> Tom
>>
>>
>> On Fri, Sep 12, 2014 at 5:47 PM, Robert Coli <rc...@eventbrite.com>
>> wrote:
>>
>>> On Fri, Sep 12, 2014 at 6:57 AM, Tom van den Berge <to...@drillster.com>
>>> wrote:
>>>
>>>> Wouldn't it be far more efficient if a node that is rebuilding itself
>>>> is responsible for not accepting reads until the rebuild is complete? E.g.
>>>> by marking it as "Joining", similar to a node that is being bootstrapped?
>>>>
>>>
>>> Yes, and Cassandra 2.0.7 and above contain this long desired
>>> functionality.
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-6961
>>>
>>> I presume that one can also run a rebuild in this state, though I
>>> haven't tried. Driftx gives it an 80% chance... try it and see and let us
>>> know? :D
>>>
>>> =Rob
>>>
>>>
>>
>>
>>
>

Re: Is it possible to bootstrap the 1st node of a new DC?

Posted by horschi <ho...@gmail.com>.

Hi Tom,

this sounds very much like my thread: "auto_bootstrap=false broken?"

Did you try booting the new node with survey-mode? I wanted to try this,
but I am waiting for 2.0.17 to come out (survey mode is broken in earlier
versions). Imho survey mode is what you (and me too) want: start a node,
accepting writes, but not serving reads. I have not tested it yet, but I
think it should work.

Also the manual join mentioned in CASSANDRA-9667 sounds very interesting.

kind regards,
Christian

On Mon, Sep 7, 2015 at 10:11 PM, Tom van den Berge <to...@drillster.com>
wrote:

> Running nodetool rebuild on a node that was started with join_ring=false
> does not work, unfortunately. The nodetool command returns immediately,
> after a message appears in the log that the streaming of data has started.
> After that, nothing happens.
>
> Tom
>
>
> On Fri, Sep 12, 2014 at 5:47 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Fri, Sep 12, 2014 at 6:57 AM, Tom van den Berge <to...@drillster.com>
>> wrote:
>>
>>> Wouldn't it be far more efficient if a node that is rebuilding itself is
>>> responsible for not accepting reads until the rebuild is complete? E.g. by
>>> marking it as "Joining", similar to a node that is being bootstrapped?
>>>
>>
>> Yes, and Cassandra 2.0.7 and above contain this long desired
>> functionality.
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-6961
>>
>> I presume that one can also run a rebuild in this state, though I haven't
>> tried. Driftx gives it an 80% chance... try it and see and let us know? :D
>>
>> =Rob
>>
>>
>
>
>