You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by sankalp kohli <ko...@gmail.com> on 2019/04/02 17:40:43 UTC

Re: Multi-DC replication and hinted handoff

Are you using OSS C*?

On Fri, Mar 29, 2019 at 1:49 AM Jens Fischer <J....@sonnen.de> wrote:

> Hi,
>
> I have a Cassandra setup with multiple data centres. The vast majority of
> writes are LOCAL_ONE writes to data center DC-A. One node (lets call this
> node A1) in DC-A has accumulated large amounts of hint files (~100 GB). In
> the logs of this node I see lots of messages like the following:
>
> INFO  [HintsDispatcher:26] 2019-03-28 01:49:25,217
> HintsDispatchExecutor.java:289 - Finished hinted handoff of file
> db485ac6-8acd-4241-9e21-7a2b540459de-1553419324363-1.hints to endpoint /
> 10.10.2.55: db485ac6-8acd-4241-9e21-7a2b540459de
>
> The node 10.10.2.55 is in DC-B, lets call this node B1. There is no
> indication whatsoever that B1 was down: Nothing in our monitoring, nothing
> in the logs of B1, nothing in the logs of A1. Are there any other
> situations where hints to B1 are stored at A1? Other than A1's failure
> detection detecting B1 as down I mean. For example could the reason for the
> hints be that B1 is overloaded and can not handle the intake from the A1?
> Or that the network connection between DC-A and DC-B is to slow?
>
> While researching this I also found the following information on Stack
> Overflow from Ben Slater regarding hints and multi-dc replication:
>
> Another factor here is the consistency level you are using - a LOCAL_*
> consistency level will only require writes to be written to the local DC
> for the operation to be considered a success (and hints will be stored for
> replication to the other DC).
> (…)
> The hints are the records of writes that have been made in one DC that are
> not yet replicated to the other DC (or even nodes within a DC). I think
> your options to avoid them are: (1) write with ALL or QUOROM (not LOCAL_*)
> consistency - this will slow down your writes but will ensure writes go
> into both DCs before the op completes (2) Don't replicate the data to the
> second DC (by setting the replication factor to 0 for the second DC in the
> keyspace definition) (3) Increase the capacity of the second DC so it can
> keep up with the writes (4) Slow down your writes so the second DC can keep
> up.
>
>
> Source: https://stackoverflow.com/a/37382726
>
> This reads like hints are used for “normal” (async) replication between
> data centres, i.e. hints could show up without any nodes being down
> whatsoever. This could explain what I am seeing. Does anyone now more about
> this? Does that mean I will see hints even if I disable hinted handoff?
>
> Any pointers or help are greatly appreciated!
>
> Thanks in advance
> Jens
>
> Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen
> Schneider, Hermann Schweizer.
> Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer
> 127/137/50792, USt.-IdNr. DE272208908
>

Re: Multi-DC replication and hinted handoff

Posted by Jens Fischer <J....@sonnen.de>.
Hi,

an update: I am pretty sure it is a problem with insufficient bandwidth. I can’t be sure because Cassandra does not seem to provide debug information on hint creation (only when replaying hints). When the bandwidth issue is solved I will try to reproduce the accumulation of hints by artificially limiting the bandwidth.

BG
Jens

On 3. Apr 2019, at 01:48, Stefan Miklosovic <st...@instaclustr.com>> wrote:

Hi Jens,

I am reading Cassandra The definitive guide and there is a chapter 9 - Reading and Writing Data and section The Cassandra Write Path and this sentence in it:

If a replica does not respond within the timeout, it is presumed to be down and a hint is stored for the write.

So your node might be actually fine eventually but it just can not cope with the load and it will reply too late after a coordinator has sufficient replies from other replicas. So it makes a hint for that write and for that node. I am not sure how is this related to turning off handoffs completely. I can do some tests locally if time allows to investigate various scenarios. There might be some subtle differences ....

On Wed, 3 Apr 2019 at 07:19, Jens Fischer <J....@sonnen.de>> wrote:
Yes, Apache Cassandra 3.11.2 (no DSE).

On 2. Apr 2019, at 19:40, sankalp kohli <ko...@gmail.com>> wrote:

Are you using OSS C*?

On Fri, Mar 29, 2019 at 1:49 AM Jens Fischer <J....@sonnen.de>> wrote:
Hi,

I have a Cassandra setup with multiple data centres. The vast majority of writes are LOCAL_ONE writes to data center DC-A. One node (lets call this node A1) in DC-A has accumulated large amounts of hint files (~100 GB). In the logs of this node I see lots of messages like the following:

INFO  [HintsDispatcher:26] 2019-03-28 01:49:25,217 HintsDispatchExecutor.java:289 - Finished hinted handoff of file db485ac6-8acd-4241-9e21-7a2b540459de-1553419324363-1.hints to endpoint /10.10.2.55<http://10.10.2.55/>: db485ac6-8acd-4241-9e21-7a2b540459de

The node 10.10.2.55 is in DC-B, lets call this node B1. There is no indication whatsoever that B1 was down: Nothing in our monitoring, nothing in the logs of B1, nothing in the logs of A1. Are there any other situations where hints to B1 are stored at A1? Other than A1's failure detection detecting B1 as down I mean. For example could the reason for the hints be that B1 is overloaded and can not handle the intake from the A1? Or that the network connection between DC-A and DC-B is to slow?

While researching this I also found the following information on Stack Overflow from Ben Slater regarding hints and multi-dc replication:

Another factor here is the consistency level you are using - a LOCAL_* consistency level will only require writes to be written to the local DC for the operation to be considered a success (and hints will be stored for replication to the other DC).
(…)
The hints are the records of writes that have been made in one DC that are not yet replicated to the other DC (or even nodes within a DC). I think your options to avoid them are: (1) write with ALL or QUOROM (not LOCAL_*) consistency - this will slow down your writes but will ensure writes go into both DCs before the op completes (2) Don't replicate the data to the second DC (by setting the replication factor to 0 for the second DC in the keyspace definition) (3) Increase the capacity of the second DC so it can keep up with the writes (4) Slow down your writes so the second DC can keep up.

Source: https://stackoverflow.com/a/37382726

This reads like hints are used for “normal” (async) replication between data centres, i.e. hints could show up without any nodes being down whatsoever. This could explain what I am seeing. Does anyone now more about this? Does that mean I will see hints even if I disable hinted handoff?

Any pointers or help are greatly appreciated!

Thanks in advance
Jens


[https://img.sonnen.de/TSEE2019_Banner_sonnenGmbH_de_1.jpg]

Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen Schneider, Hermann Schweizer.
Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, USt.-IdNr. DE272208908


[https://img.sonnen.de/TSEE2019_Banner_sonnenGmbH_de_1.jpg]

Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen Schneider, Hermann Schweizer.
Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, USt.-IdNr. DE272208908


[https://img.sonnen.de/TSEE2019_Banner_sonnenGmbH_de_1.jpg]

Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen Schneider, Hermann Schweizer.
Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, USt.-IdNr. DE272208908

Re: Multi-DC replication and hinted handoff

Posted by Stefan Miklosovic <st...@instaclustr.com>.
Hi Jens,

I am reading Cassandra The definitive guide and there is a chapter 9 -
Reading and Writing Data and section The Cassandra Write Path and this
sentence in it:

If a replica does not respond within the timeout, it is presumed to be down
and a hint is stored for the write.

So your node might be actually fine eventually but it just can not cope
with the load and it will reply too late after a coordinator has sufficient
replies from other replicas. So it makes a hint for that write and for that
node. I am not sure how is this related to turning off handoffs completely.
I can do some tests locally if time allows to investigate various
scenarios. There might be some subtle differences ....

On Wed, 3 Apr 2019 at 07:19, Jens Fischer <J....@sonnen.de> wrote:

> Yes, Apache Cassandra 3.11.2 (no DSE).
>
> On 2. Apr 2019, at 19:40, sankalp kohli <ko...@gmail.com> wrote:
>
> Are you using OSS C*?
>
> On Fri, Mar 29, 2019 at 1:49 AM Jens Fischer <J....@sonnen.de> wrote:
>
>> Hi,
>>
>> I have a Cassandra setup with multiple data centres. The vast majority of
>> writes are LOCAL_ONE writes to data center DC-A. One node (lets call this
>> node A1) in DC-A has accumulated large amounts of hint files (~100 GB). In
>> the logs of this node I see lots of messages like the following:
>>
>> INFO  [HintsDispatcher:26] 2019-03-28 01:49:25,217
>> HintsDispatchExecutor.java:289 - Finished hinted handoff of file
>> db485ac6-8acd-4241-9e21-7a2b540459de-1553419324363-1.hints to endpoint /
>> 10.10.2.55: db485ac6-8acd-4241-9e21-7a2b540459de
>>
>> The node 10.10.2.55 is in DC-B, lets call this node B1. There is no
>> indication whatsoever that B1 was down: Nothing in our monitoring, nothing
>> in the logs of B1, nothing in the logs of A1. Are there any other
>> situations where hints to B1 are stored at A1? Other than A1's failure
>> detection detecting B1 as down I mean. For example could the reason for the
>> hints be that B1 is overloaded and can not handle the intake from the A1?
>> Or that the network connection between DC-A and DC-B is to slow?
>>
>> While researching this I also found the following information on Stack
>> Overflow from Ben Slater regarding hints and multi-dc replication:
>>
>> Another factor here is the consistency level you are using - a LOCAL_*
>> consistency level will only require writes to be written to the local DC
>> for the operation to be considered a success (and hints will be stored for
>> replication to the other DC).
>> (…)
>> The hints are the records of writes that have been made in one DC that
>> are not yet replicated to the other DC (or even nodes within a DC). I think
>> your options to avoid them are: (1) write with ALL or QUOROM (not LOCAL_*)
>> consistency - this will slow down your writes but will ensure writes go
>> into both DCs before the op completes (2) Don't replicate the data to the
>> second DC (by setting the replication factor to 0 for the second DC in the
>> keyspace definition) (3) Increase the capacity of the second DC so it can
>> keep up with the writes (4) Slow down your writes so the second DC can keep
>> up.
>>
>>
>> Source: https://stackoverflow.com/a/37382726
>>
>> This reads like hints are used for “normal” (async) replication between
>> data centres, i.e. hints could show up without any nodes being down
>> whatsoever. This could explain what I am seeing. Does anyone now more about
>> this? Does that mean I will see hints even if I disable hinted handoff?
>>
>> Any pointers or help are greatly appreciated!
>>
>> Thanks in advance
>> Jens
>>
>> Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen
>> Schneider, Hermann Schweizer.
>> Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer
>> 127/137/50792, USt.-IdNr. DE272208908
>>
>
> Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen
> Schneider, Hermann Schweizer.
> Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer
> 127/137/50792, USt.-IdNr. DE272208908
>

Re: Multi-DC replication and hinted handoff

Posted by Jens Fischer <J....@sonnen.de>.
Yes, Apache Cassandra 3.11.2 (no DSE).

On 2. Apr 2019, at 19:40, sankalp kohli <ko...@gmail.com>> wrote:

Are you using OSS C*?

On Fri, Mar 29, 2019 at 1:49 AM Jens Fischer <J....@sonnen.de>> wrote:
Hi,

I have a Cassandra setup with multiple data centres. The vast majority of writes are LOCAL_ONE writes to data center DC-A. One node (lets call this node A1) in DC-A has accumulated large amounts of hint files (~100 GB). In the logs of this node I see lots of messages like the following:

INFO  [HintsDispatcher:26] 2019-03-28 01:49:25,217 HintsDispatchExecutor.java:289 - Finished hinted handoff of file db485ac6-8acd-4241-9e21-7a2b540459de-1553419324363-1.hints to endpoint /10.10.2.55<http://10.10.2.55/>: db485ac6-8acd-4241-9e21-7a2b540459de

The node 10.10.2.55 is in DC-B, lets call this node B1. There is no indication whatsoever that B1 was down: Nothing in our monitoring, nothing in the logs of B1, nothing in the logs of A1. Are there any other situations where hints to B1 are stored at A1? Other than A1's failure detection detecting B1 as down I mean. For example could the reason for the hints be that B1 is overloaded and can not handle the intake from the A1? Or that the network connection between DC-A and DC-B is to slow?

While researching this I also found the following information on Stack Overflow from Ben Slater regarding hints and multi-dc replication:

Another factor here is the consistency level you are using - a LOCAL_* consistency level will only require writes to be written to the local DC for the operation to be considered a success (and hints will be stored for replication to the other DC).
(…)
The hints are the records of writes that have been made in one DC that are not yet replicated to the other DC (or even nodes within a DC). I think your options to avoid them are: (1) write with ALL or QUOROM (not LOCAL_*) consistency - this will slow down your writes but will ensure writes go into both DCs before the op completes (2) Don't replicate the data to the second DC (by setting the replication factor to 0 for the second DC in the keyspace definition) (3) Increase the capacity of the second DC so it can keep up with the writes (4) Slow down your writes so the second DC can keep up.

Source: https://stackoverflow.com/a/37382726

This reads like hints are used for “normal” (async) replication between data centres, i.e. hints could show up without any nodes being down whatsoever. This could explain what I am seeing. Does anyone now more about this? Does that mean I will see hints even if I disable hinted handoff?

Any pointers or help are greatly appreciated!

Thanks in advance
Jens


[https://img.sonnen.de/TSEE2019_Banner_sonnenGmbH_de_1.jpg]

Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen Schneider, Hermann Schweizer.
Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, USt.-IdNr. DE272208908


[https://img.sonnen.de/TSEE2019_Banner_sonnenGmbH_de_1.jpg]

Geschäftsführer: Christoph Ostermann (CEO), Oliver Koch, Steffen Schneider, Hermann Schweizer.
Amtsgericht Kempten/Allgäu, Registernummer: 10655, Steuernummer 127/137/50792, USt.-IdNr. DE272208908