You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by preetika tyagi <pr...@gmail.com> on 2017/04/05 16:22:29 UTC

how to recover a dead node using commit log when memtable is lost

Hi,

I read in Cassandra architecture documentation that if a node dies and
there is some data in memtable which hasn't been written to the sstable,
the commit log replay happens (assuming the commit log had been flushed to
disk) when the node restarts and hence the data can be recovered.

However, I was wondering if a node is fully dead for some reason with
consistency level 1 (replication factor 3 but let's say it dies right after
it finishes a request and hence the data hasn't been replicated to other
nodes yet) and the data is only available in commit log on that node. Is
there a way to recover data from this node which includes both sstable and
commit log so that we can use it to replace it with a new node where we
could replay commit log to recover the data?

Thanks,
Preetika

Re: how to recover a dead node using commit log when memtable is lost

Posted by preetika tyagi <pr...@gmail.com>.

Assuming we are using periodic mode for commit log sync.

On Wed, Apr 5, 2017 at 3:29 PM, preetika tyagi <pr...@gmail.com>
wrote:

> Very good explanation.
> One follow-up question. If CL is set to 1 and RF to 3, then there are
> chances of the data being lost if the machine crashes before replication
> happens and the commit log (on the node which processed the data for CF=1)
> is not synced yet. Right?
>
> Thanks,
> Preetika
>
> On Wed, Apr 5, 2017 at 1:17 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> I beg to differ with @Matija here, IMO by default cassandra syncs data
>> into commit log in a periodic fashion with a fsync period of 10 sec (Ref -
>> https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L361).
>> If a write is not written to disk and RF is 1 else CL is Local One & node
>> goes down then there could be potential data loss and client would expect
>> data to be present.
>>
>> Therefore a good strategy would be to either have RF 3 and write with
>> quorum. Or if thats not a feasible option then use Batch Mode for commitlog
>> sync - That could lead to much higher disk io overhead - (say if you fsync
>> every 10 ms in batch (write latency in this case will be 10 ms as write
>> threads will be blocked for 10ms.  assuming continuous writes- you would be
>> issuing 1000/10 IO write IO to disk - 100 IOPS. If thats kept to 1ms to
>> reduce write latency to 1ms then IOPS becomes 1000)).
>>
>> So in case of batch mode it get tricky to balance latency & disk
>> utilisation. Testing this setting thoroughly on dev env would be
>> recommended as it can adversely affect performance. We had done some
>> benchmarks and had found 50ms to be ideal for our use case but thats
>> subjective as it leads to write latencies in excess of 50ms which could be
>> really high for some use cases. Though with modern day ssd's batch option
>> can be worthwhile to experiment with.
>>
>> A good description is also given here - http://stackoverflow.com/a/310
>> 33900/3646120
>>
>> On Thu, Apr 6, 2017 at 12:30 AM, Matija Gobec <ma...@gmail.com>
>> wrote:
>>
>>> Flushes have nothing to do with data persistence and node failure. Each
>>> write is acknowledged only when data has been written to the commit log AND
>>> memtable. That solves the issues of node failures and data consistency.
>>> When the node boots back up it replays commit log files and you don't loose
>>> data that was already written to that node.
>>>
>>> On Wed, Apr 5, 2017 at 6:22 PM, preetika tyagi <pr...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I read in Cassandra architecture documentation that if a node dies and
>>>> there is some data in memtable which hasn't been written to the sstable,
>>>> the commit log replay happens (assuming the commit log had been flushed to
>>>> disk) when the node restarts and hence the data can be recovered.
>>>>
>>>> However, I was wondering if a node is fully dead for some reason with
>>>> consistency level 1 (replication factor 3 but let's say it dies right after
>>>> it finishes a request and hence the data hasn't been replicated to other
>>>> nodes yet) and the data is only available in commit log on that node. Is
>>>> there a way to recover data from this node which includes both sstable and
>>>> commit log so that we can use it to replace it with a new node where we
>>>> could replay commit log to recover the data?
>>>>
>>>> Thanks,
>>>> Preetika
>>>>
>>>
>>>
>>
>

Re: how to recover a dead node using commit log when memtable is lost

Posted by preetika tyagi <pr...@gmail.com>.

Very good explanation.
One follow-up question. If CL is set to 1 and RF to 3, then there are
chances of the data being lost if the machine crashes before replication
happens and the commit log (on the node which processed the data for CF=1)
is not synced yet. Right?

Thanks,
Preetika

On Wed, Apr 5, 2017 at 1:17 PM, Bhuvan Rawal <bh...@gmail.com> wrote:

> I beg to differ with @Matija here, IMO by default cassandra syncs data
> into commit log in a periodic fashion with a fsync period of 10 sec (Ref -
> https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L361).
> If a write is not written to disk and RF is 1 else CL is Local One & node
> goes down then there could be potential data loss and client would expect
> data to be present.
>
> Therefore a good strategy would be to either have RF 3 and write with
> quorum. Or if thats not a feasible option then use Batch Mode for commitlog
> sync - That could lead to much higher disk io overhead - (say if you fsync
> every 10 ms in batch (write latency in this case will be 10 ms as write
> threads will be blocked for 10ms.  assuming continuous writes- you would be
> issuing 1000/10 IO write IO to disk - 100 IOPS. If thats kept to 1ms to
> reduce write latency to 1ms then IOPS becomes 1000)).
>
> So in case of batch mode it get tricky to balance latency & disk
> utilisation. Testing this setting thoroughly on dev env would be
> recommended as it can adversely affect performance. We had done some
> benchmarks and had found 50ms to be ideal for our use case but thats
> subjective as it leads to write latencies in excess of 50ms which could be
> really high for some use cases. Though with modern day ssd's batch option
> can be worthwhile to experiment with.
>
> A good description is also given here - http://stackoverflow.com/a/
> 31033900/3646120
>
> On Thu, Apr 6, 2017 at 12:30 AM, Matija Gobec <ma...@gmail.com>
> wrote:
>
>> Flushes have nothing to do with data persistence and node failure. Each
>> write is acknowledged only when data has been written to the commit log AND
>> memtable. That solves the issues of node failures and data consistency.
>> When the node boots back up it replays commit log files and you don't loose
>> data that was already written to that node.
>>
>> On Wed, Apr 5, 2017 at 6:22 PM, preetika tyagi <pr...@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I read in Cassandra architecture documentation that if a node dies and
>>> there is some data in memtable which hasn't been written to the sstable,
>>> the commit log replay happens (assuming the commit log had been flushed to
>>> disk) when the node restarts and hence the data can be recovered.
>>>
>>> However, I was wondering if a node is fully dead for some reason with
>>> consistency level 1 (replication factor 3 but let's say it dies right after
>>> it finishes a request and hence the data hasn't been replicated to other
>>> nodes yet) and the data is only available in commit log on that node. Is
>>> there a way to recover data from this node which includes both sstable and
>>> commit log so that we can use it to replace it with a new node where we
>>> could replay commit log to recover the data?
>>>
>>> Thanks,
>>> Preetika
>>>
>>
>>
>

Re: how to recover a dead node using commit log when memtable is lost

Posted by Bhuvan Rawal <bh...@gmail.com>.

I beg to differ with @Matija here, IMO by default cassandra syncs data into
commit log in a periodic fashion with a fsync period of 10 sec (Ref -
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L361).
If a write is not written to disk and RF is 1 else CL is Local One & node
goes down then there could be potential data loss and client would expect
data to be present.

Therefore a good strategy would be to either have RF 3 and write with
quorum. Or if thats not a feasible option then use Batch Mode for commitlog
sync - That could lead to much higher disk io overhead - (say if you fsync
every 10 ms in batch (write latency in this case will be 10 ms as write
threads will be blocked for 10ms.  assuming continuous writes- you would be
issuing 1000/10 IO write IO to disk - 100 IOPS. If thats kept to 1ms to
reduce write latency to 1ms then IOPS becomes 1000)).

So in case of batch mode it get tricky to balance latency & disk
utilisation. Testing this setting thoroughly on dev env would be
recommended as it can adversely affect performance. We had done some
benchmarks and had found 50ms to be ideal for our use case but thats
subjective as it leads to write latencies in excess of 50ms which could be
really high for some use cases. Though with modern day ssd's batch option
can be worthwhile to experiment with.

A good description is also given here -
http://stackoverflow.com/a/31033900/3646120

On Thu, Apr 6, 2017 at 12:30 AM, Matija Gobec <ma...@gmail.com> wrote:

> Flushes have nothing to do with data persistence and node failure. Each
> write is acknowledged only when data has been written to the commit log AND
> memtable. That solves the issues of node failures and data consistency.
> When the node boots back up it replays commit log files and you don't loose
> data that was already written to that node.
>
> On Wed, Apr 5, 2017 at 6:22 PM, preetika tyagi <pr...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I read in Cassandra architecture documentation that if a node dies and
>> there is some data in memtable which hasn't been written to the sstable,
>> the commit log replay happens (assuming the commit log had been flushed to
>> disk) when the node restarts and hence the data can be recovered.
>>
>> However, I was wondering if a node is fully dead for some reason with
>> consistency level 1 (replication factor 3 but let's say it dies right after
>> it finishes a request and hence the data hasn't been replicated to other
>> nodes yet) and the data is only available in commit log on that node. Is
>> there a way to recover data from this node which includes both sstable and
>> commit log so that we can use it to replace it with a new node where we
>> could replay commit log to recover the data?
>>
>> Thanks,
>> Preetika
>>
>
>

Re: how to recover a dead node using commit log when memtable is lost

Posted by Matija Gobec <ma...@gmail.com>.

Flushes have nothing to do with data persistence and node failure. Each
write is acknowledged only when data has been written to the commit log AND
memtable. That solves the issues of node failures and data consistency.
When the node boots back up it replays commit log files and you don't loose
data that was already written to that node.

On Wed, Apr 5, 2017 at 6:22 PM, preetika tyagi <pr...@gmail.com>
wrote:

> Hi,
>
> I read in Cassandra architecture documentation that if a node dies and
> there is some data in memtable which hasn't been written to the sstable,
> the commit log replay happens (assuming the commit log had been flushed to
> disk) when the node restarts and hence the data can be recovered.
>
> However, I was wondering if a node is fully dead for some reason with
> consistency level 1 (replication factor 3 but let's say it dies right after
> it finishes a request and hence the data hasn't been replicated to other
> nodes yet) and the data is only available in commit log on that node. Is
> there a way to recover data from this node which includes both sstable and
> commit log so that we can use it to replace it with a new node where we
> could replay commit log to recover the data?
>
> Thanks,
> Preetika
>