You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Daniel Hölbling-Inzko <da...@bitmovin.com> on 2017/08/02 08:53:21 UTC

Bootstrapping a new Node with Consistency=ONE

Hi,
It's probably a strange question but I have a heavily read-optimized
payload where data integrity is not a big deal. So to keep latencies low I
am reading with Consistency ONE from my Multi-DC Cluster.

Now the issue I saw is that I needed to add another Cassandra node (for
redundancy reasons).
Since I want this for renduncancy I booted the node and then changed the
Replication of my Keyspace to include the new node (all nodes have 100% of
the data).

The issue I was seeing is that clients that connected to the new Node
afterwards were seeing incomplete data - so the Key would already be
present, but the columns would all be null values.
I expect this to die down once the node is fully replicated, but in the
meantime a lot of my connected clients were in trouble. (The application
can handle seeing old data - incomplete is another matter all together)

The total data in question is a negligible 500kb (so nothing that should
really take any amount of time in my opinion but it took a few minutes for
the data to replicate over and I am still not sure everything is replicated
correctly).

Increasing the RF to something higher won't really help as the setup is
dc1: 3; dc2: 2 (I added the second node in dc2). So a LOCAL_QUORUM in dc2
would still be 2 nodes which means I just can't loose either of them.
Adding a third node is not really cost effective for the current workloads
these nodes need to handle.

Any advice on how to avoid this in the future? Is there a way to start up a
node that does not serve client requests but does replicate data?

greetings Daniel

Re: Bootstrapping a new Node with Consistency=ONE

Posted by kurt greaves <ku...@instaclustr.com>.

only in this one case might that work (RF==N)

Re: Bootstrapping a new Node with Consistency=ONE

Posted by Oleksandr Shulgin <ol...@zalando.de>.

On Wed, Aug 2, 2017 at 10:53 AM, Daniel Hölbling-Inzko <
daniel.hoelbling-inzko@bitmovin.com> wrote:

>
> Any advice on how to avoid this in the future? Is there a way to start up
> a node that does not serve client requests but does replicate data?
>

Would it not work if you first increase the RF and then add the new node?

--
Alex

Re: Bootstrapping a new Node with Consistency=ONE

Posted by kurt greaves <ku...@instaclustr.com>.

You can't just add a new DC and then tell their clients to connect to the
new one (after migrating all the data to it obv.)? If you can't achieve
that you should probably use GossipingPropertyFileSnitch. Your best plan
is to have the desired RF/redundancy from the start. Changing RF in
production is not fun and can be costly.

Re: Bootstrapping a new Node with Consistency=ONE

Posted by Daniel Hölbling-Inzko <da...@bitmovin.com>.

Thanks for the pointers Kurt!

I did increase the RF to N so that would not have been the issue.
DC migration is also a problem since I am using the Google Cloud Snitch. So
I'd have to take down the whole DC and restart anew (which would mess with
my clients as they only connect to their local DC).

As I said this was a small issue here - we only  were seeing the issue for
5 minutes. But considering how miniscule the amount of data to replicate
was (400 rows with a total of 500kb) I am a bit worried on how to do this
once loads increases.

greetings Daniel

On Wed, 2 Aug 2017 at 11:50 kurt greaves <ku...@instaclustr.com> wrote:

> If you want to change RF on a live system your best bet is through DC
> migration (add another DC with the desired # of nodes and RF), and migrate
> your clients to use that DC. There is a way to boot a node and not join the
> ring, however I don't think it will work for new nodes (have not
> confirmed), also increasing RF in this way would only not be completely
> catastrophic if you were increasing RF to N (num nodes).
>

Re: Bootstrapping a new Node with Consistency=ONE

Posted by kurt greaves <ku...@instaclustr.com>.

If you want to change RF on a live system your best bet is through DC
migration (add another DC with the desired # of nodes and RF), and migrate
your clients to use that DC. There is a way to boot a node and not join the
ring, however I don't think it will work for new nodes (have not
confirmed), also increasing RF in this way would only not be completely
catastrophic if you were increasing RF to N (num nodes).

Re: Bootstrapping a new Node with Consistency=ONE

Posted by Daniel Hölbling-Inzko <da...@bitmovin.com>.

That makes sense. Thank you so much for pointing that out Alex.
So long story short. Once I am up to the RF I actually want (RF3 per DC)
and am just adding nodes for capacity joining the Ring will correctly work
and no inconsistencies will exist.
If I just change the RF the nodes don't have the data yet so a repair needs
to be run.

Awesome - thanks so much.

greetings Daniel

On Thu, 3 Aug 2017 at 09:56 Oleksandr Shulgin <ol...@zalando.de>
wrote:

> On Thu, Aug 3, 2017 at 9:33 AM, Daniel Hölbling-Inzko <
> daniel.hoelbling-inzko@bitmovin.com> wrote:
>
>> No I set Auto bootstrap to true and the node was UN in nodetool status
>> but when doing a select on the node with ONE I got incomplete data.
>>
>
> What I think is happening here is not related to the new node being added.
>
> When you increase Replication Factor, that does not automatically
> redistribute the existing data.  It just makes other nodes responsible for
> portions of the data they might not really have yet.  So I would expect
> that all your nodes show some inconsistencies, before you run a full repair
> of the ring.
>
> I can fairly easily reproduce it locally with ccm[1], 3 nodes, version
> 3.0.13.
>
> $ ccm status
> Cluster: 'v3013'
> ----------------
> node1: UP
> node3: UP
> node2: UP
>
> $ ccm node1 cqlsh
> cqlsh> create keyspace test_rf WITH replication = {'class':
> 'NetworkTopologyStrategy', 'datacenter1': 1};
> cqlsh> create table test_rf.t1(id int, data text, primary key(id));
> cqlsh> insert into test_rf.t1(id, data) values(1, 'one');
> cqlsh> select * from test_rf.t1;
>
>  id | data
> ----+------
>   1 |  one
>
> (1 rows)
>
> At this point selecting from t1 works correctly on any of the nodes with
> the default CL=ONE.
>
> If we would now increase the RF and try reading again, something
> surprising will happen:
>
> cqlsh> alter keyspace test_rf WITH replication = {'class':
> 'NetworkTopologyStrategy', 'datacenter1': 2};
> cqlsh> select * from test_rf.t1;
>
>  id | data
> ----+------
>
> (0 rows)
>
> And in my test this happens on all nodes at the same time.  Explanation is
> fairly simple: now a different node is responsible for the data that was
> written to only one other node previously.
>
> A repair in this tiny test is trivial:
> cqlsh> CONSISTENCY ALL;
> cqlsh> select * from test_rf.t1;
>
>  id | data
> ----+------
>   1 |  one
>
> (1 rows)
>
> And now the data can be read from any node again, since we did a "full
> repair".
>
> --
> Alex
>
> [1] https://github.com/pcmanus/ccm
>
>

Re: Bootstrapping a new Node with Consistency=ONE

Posted by Oleksandr Shulgin <ol...@zalando.de>.

On Thu, Aug 3, 2017 at 9:33 AM, Daniel Hölbling-Inzko <
daniel.hoelbling-inzko@bitmovin.com> wrote:

> No I set Auto bootstrap to true and the node was UN in nodetool status but
> when doing a select on the node with ONE I got incomplete data.
>

What I think is happening here is not related to the new node being added.

When you increase Replication Factor, that does not automatically
redistribute the existing data.  It just makes other nodes responsible for
portions of the data they might not really have yet.  So I would expect
that all your nodes show some inconsistencies, before you run a full repair
of the ring.

I can fairly easily reproduce it locally with ccm[1], 3 nodes, version
3.0.13.

$ ccm status
Cluster: 'v3013'
----------------
node1: UP
node3: UP
node2: UP

$ ccm node1 cqlsh
cqlsh> create keyspace test_rf WITH replication = {'class':
'NetworkTopologyStrategy', 'datacenter1': 1};
cqlsh> create table test_rf.t1(id int, data text, primary key(id));
cqlsh> insert into test_rf.t1(id, data) values(1, 'one');
cqlsh> select * from test_rf.t1;

 id | data
----+------
  1 |  one

(1 rows)

At this point selecting from t1 works correctly on any of the nodes with
the default CL=ONE.

If we would now increase the RF and try reading again, something surprising
will happen:

cqlsh> alter keyspace test_rf WITH replication = {'class':
'NetworkTopologyStrategy', 'datacenter1': 2};
cqlsh> select * from test_rf.t1;

 id | data
----+------

(0 rows)

And in my test this happens on all nodes at the same time.  Explanation is
fairly simple: now a different node is responsible for the data that was
written to only one other node previously.

A repair in this tiny test is trivial:
cqlsh> CONSISTENCY ALL;
cqlsh> select * from test_rf.t1;

 id | data
----+------
  1 |  one

(1 rows)

And now the data can be read from any node again, since we did a "full
repair".

--
Alex

[1] https://github.com/pcmanus/ccm

Re: Bootstrapping a new Node with Consistency=ONE

Posted by Daniel Hölbling-Inzko <da...@bitmovin.com>.

No I set Auto bootstrap to true and the node was UN in nodetool status but
when doing a select on the node with ONE I got incomplete data.
Jeff Jirsa <jj...@gmail.com> schrieb am Do. 3. Aug. 2017 um 09:02:

> "nodetool status" shows node as UN (up normal) instead of UJ (up joining)
>
> What you're describing really sounds odd. Something isn't adding up to me
> but I'm not sure why. You shouldn't be able to query it directly until its
> bootstrapped as far as I know
>
> Are you sure you're not joining as a seed node? Or with auto bootstrap set
> to false?
>
>
> --
> Jeff Jirsa
>
>
> On Aug 2, 2017, at 11:52 PM, Daniel Hölbling-Inzko <
> daniel.hoelbling-inzko@bitmovin.com> wrote:
>
> Thanks Jeff. How do I determine that bootstrap is finished? Haven't seen
> that anywhere so far.
>
> Reads via storage would be ok as every query would be checked by another
> node too. I was only seeing inconsistencies since clients went directly to
> the node with Consistency ONE
>
> Greetings
> Jeff Jirsa <jj...@gmail.com> schrieb am Mi. 2. Aug. 2017 um 16:01:
>
>> By the time bootstrap is complete it should be as consistent as the
>> source node - you can change start_native_transport to false to avoid
>> serving clients directly (tcp/9042), but it'll still serve reads via the
>> storage service (tcp/7000), but the guarantee is that data should be
>> consistent by the time bootstrap finishes
>>
>>
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> > On Aug 2, 2017, at 1:53 AM, Daniel Hölbling-Inzko <
>> daniel.hoelbling-inzko@bitmovin.com> wrote:
>> >
>> > Hi,
>> > It's probably a strange question but I have a heavily read-optimized
>> payload where data integrity is not a big deal. So to keep latencies low I
>> am reading with Consistency ONE from my Multi-DC Cluster.
>> >
>> > Now the issue I saw is that I needed to add another Cassandra node (for
>> redundancy reasons).
>> > Since I want this for renduncancy I booted the node and then changed
>> the Replication of my Keyspace to include the new node (all nodes have 100%
>> of the data).
>> >
>> > The issue I was seeing is that clients that connected to the new Node
>> afterwards were seeing incomplete data - so the Key would already be
>> present, but the columns would all be null values.
>> > I expect this to die down once the node is fully replicated, but in the
>> meantime a lot of my connected clients were in trouble. (The application
>> can handle seeing old data - incomplete is another matter all together)
>> >
>> > The total data in question is a negligible 500kb (so nothing that
>> should really take any amount of time in my opinion but it took a few
>> minutes for the data to replicate over and I am still not sure everything
>> is replicated correctly).
>> >
>> > Increasing the RF to something higher won't really help as the setup is
>> dc1: 3; dc2: 2 (I added the second node in dc2). So a LOCAL_QUORUM in dc2
>> would still be 2 nodes which means I just can't loose either of them.
>> Adding a third node is not really cost effective for the current workloads
>> these nodes need to handle.
>> >
>> > Any advice on how to avoid this in the future? Is there a way to start
>> up a node that does not serve client requests but does replicate data?
>> >
>> > greetings Daniel
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>

Re: Bootstrapping a new Node with Consistency=ONE

Posted by Jeff Jirsa <jj...@gmail.com>.

"nodetool status" shows node as UN (up normal) instead of UJ (up joining)

What you're describing really sounds odd. Something isn't adding up to me but I'm not sure why. You shouldn't be able to query it directly until its bootstrapped as far as I know

Are you sure you're not joining as a seed node? Or with auto bootstrap set to false?


-- 
Jeff Jirsa


> On Aug 2, 2017, at 11:52 PM, Daniel Hölbling-Inzko <da...@bitmovin.com> wrote:
> 
> Thanks Jeff. How do I determine that bootstrap is finished? Haven't seen that anywhere so far. 
> 
> Reads via storage would be ok as every query would be checked by another node too. I was only seeing inconsistencies since clients went directly to the node with Consistency ONE
> 
> Greetings 
> Jeff Jirsa <jj...@gmail.com> schrieb am Mi. 2. Aug. 2017 um 16:01:
>> By the time bootstrap is complete it should be as consistent as the source node - you can change start_native_transport to false to avoid serving clients directly (tcp/9042), but it'll still serve reads via the storage service (tcp/7000), but the guarantee is that data should be consistent by the time bootstrap finishes
>> 
>> 
>> 
>> 
>> --
>> Jeff Jirsa
>> 
>> 
>> > On Aug 2, 2017, at 1:53 AM, Daniel Hölbling-Inzko <da...@bitmovin.com> wrote:
>> >
>> > Hi,
>> > It's probably a strange question but I have a heavily read-optimized payload where data integrity is not a big deal. So to keep latencies low I am reading with Consistency ONE from my Multi-DC Cluster.
>> >
>> > Now the issue I saw is that I needed to add another Cassandra node (for redundancy reasons).
>> > Since I want this for renduncancy I booted the node and then changed the Replication of my Keyspace to include the new node (all nodes have 100% of the data).
>> >
>> > The issue I was seeing is that clients that connected to the new Node afterwards were seeing incomplete data - so the Key would already be present, but the columns would all be null values.
>> > I expect this to die down once the node is fully replicated, but in the meantime a lot of my connected clients were in trouble. (The application can handle seeing old data - incomplete is another matter all together)
>> >
>> > The total data in question is a negligible 500kb (so nothing that should really take any amount of time in my opinion but it took a few minutes for the data to replicate over and I am still not sure everything is replicated correctly).
>> >
>> > Increasing the RF to something higher won't really help as the setup is dc1: 3; dc2: 2 (I added the second node in dc2). So a LOCAL_QUORUM in dc2 would still be 2 nodes which means I just can't loose either of them. Adding a third node is not really cost effective for the current workloads these nodes need to handle.
>> >
>> > Any advice on how to avoid this in the future? Is there a way to start up a node that does not serve client requests but does replicate data?
>> >
>> > greetings Daniel
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>

Re: Bootstrapping a new Node with Consistency=ONE

Posted by Daniel Hölbling-Inzko <da...@bitmovin.com>.

Thanks Jeff. How do I determine that bootstrap is finished? Haven't seen
that anywhere so far.

Reads via storage would be ok as every query would be checked by another
node too. I was only seeing inconsistencies since clients went directly to
the node with Consistency ONE

Greetings
Jeff Jirsa <jj...@gmail.com> schrieb am Mi. 2. Aug. 2017 um 16:01:

> By the time bootstrap is complete it should be as consistent as the source
> node - you can change start_native_transport to false to avoid serving
> clients directly (tcp/9042), but it'll still serve reads via the storage
> service (tcp/7000), but the guarantee is that data should be consistent by
> the time bootstrap finishes
>
>
>
>
> --
> Jeff Jirsa
>
>
> > On Aug 2, 2017, at 1:53 AM, Daniel Hölbling-Inzko <
> daniel.hoelbling-inzko@bitmovin.com> wrote:
> >
> > Hi,
> > It's probably a strange question but I have a heavily read-optimized
> payload where data integrity is not a big deal. So to keep latencies low I
> am reading with Consistency ONE from my Multi-DC Cluster.
> >
> > Now the issue I saw is that I needed to add another Cassandra node (for
> redundancy reasons).
> > Since I want this for renduncancy I booted the node and then changed the
> Replication of my Keyspace to include the new node (all nodes have 100% of
> the data).
> >
> > The issue I was seeing is that clients that connected to the new Node
> afterwards were seeing incomplete data - so the Key would already be
> present, but the columns would all be null values.
> > I expect this to die down once the node is fully replicated, but in the
> meantime a lot of my connected clients were in trouble. (The application
> can handle seeing old data - incomplete is another matter all together)
> >
> > The total data in question is a negligible 500kb (so nothing that should
> really take any amount of time in my opinion but it took a few minutes for
> the data to replicate over and I am still not sure everything is replicated
> correctly).
> >
> > Increasing the RF to something higher won't really help as the setup is
> dc1: 3; dc2: 2 (I added the second node in dc2). So a LOCAL_QUORUM in dc2
> would still be 2 nodes which means I just can't loose either of them.
> Adding a third node is not really cost effective for the current workloads
> these nodes need to handle.
> >
> > Any advice on how to avoid this in the future? Is there a way to start
> up a node that does not serve client requests but does replicate data?
> >
> > greetings Daniel
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: Bootstrapping a new Node with Consistency=ONE

Posted by Jeff Jirsa <jj...@gmail.com>.

By the time bootstrap is complete it should be as consistent as the source node - you can change start_native_transport to false to avoid serving clients directly (tcp/9042), but it'll still serve reads via the storage service (tcp/7000), but the guarantee is that data should be consistent by the time bootstrap finishes




-- 
Jeff Jirsa


> On Aug 2, 2017, at 1:53 AM, Daniel Hölbling-Inzko <da...@bitmovin.com> wrote:
> 
> Hi,
> It's probably a strange question but I have a heavily read-optimized payload where data integrity is not a big deal. So to keep latencies low I am reading with Consistency ONE from my Multi-DC Cluster.
> 
> Now the issue I saw is that I needed to add another Cassandra node (for redundancy reasons). 
> Since I want this for renduncancy I booted the node and then changed the Replication of my Keyspace to include the new node (all nodes have 100% of the data).
> 
> The issue I was seeing is that clients that connected to the new Node afterwards were seeing incomplete data - so the Key would already be present, but the columns would all be null values.
> I expect this to die down once the node is fully replicated, but in the meantime a lot of my connected clients were in trouble. (The application can handle seeing old data - incomplete is another matter all together)
> 
> The total data in question is a negligible 500kb (so nothing that should really take any amount of time in my opinion but it took a few minutes for the data to replicate over and I am still not sure everything is replicated correctly).
> 
> Increasing the RF to something higher won't really help as the setup is dc1: 3; dc2: 2 (I added the second node in dc2). So a LOCAL_QUORUM in dc2 would still be 2 nodes which means I just can't loose either of them. Adding a third node is not really cost effective for the current workloads these nodes need to handle.
> 
> Any advice on how to avoid this in the future? Is there a way to start up a node that does not serve client requests but does replicate data?
> 
> greetings Daniel

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org