You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Steve Lihn <st...@gmail.com> on 2010/05/24 17:47:48 UTC

Cassandra data loss

I am evaluating Cassandra as a candidate for our next-gen database. One of
my colleagues told me that "it's not recommended to use it as your system of
Record because it CAN lose data". Can someone with architecture
understanding shed some light on under what circumstance Cassandra cluster
can either lose data or become inconsistent ? (a node in a cluster crashes,
network partitions, I/O glitches, etc.)

Thanks,
Steve

Re: Cassandra data loss

Posted by Jonathan Ellis <jb...@gmail.com>.
You also need to set CommitLogSync to batch instead of periodic if you
Absolutely Cannot Lose Data.

On Mon, May 24, 2010 at 10:51 AM, Joe Stump <jo...@joestump.net> wrote:
> This is largely FUD. Cassandra let's you choose how consistent you want writes to be. The more consistency you choose, the slower the writes, but it's very unlikely with high consistency that you'll lose data.
>
> That being said, if you write with a consistency level of 0 then, yes, you could lose data. Cassandra's consistency is much like root privileges on Unix systems; it gives you more than enough rope to hang yourself if you so choose to.
>
> --Joe
>
>
> On May 24, 2010, at 9:47 AM, Steve Lihn wrote:
>
>> I am evaluating Cassandra as a candidate for our next-gen database. One of my colleagues told me that "it's not recommended to use it as your system of Record because it CAN lose data". Can someone with architecture understanding shed some light on under what circumstance Cassandra cluster can either lose data or become inconsistent ? (a node in a cluster crashes, network partitions, I/O glitches, etc.)
>>
>> Thanks,
>> Steve
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Cassandra data loss

Posted by Joe Stump <jo...@joestump.net>.
On May 24, 2010, at 10:01 AM, Steve Lihn wrote:

> So if I set it up to be strongly consistent, I should have the same level of consistency as traditional relational DB ?

If you do, say, QUORUM on the consistency level it will ensure at least 2 out of the 3 replicants have responded back that they've saved the data. RDBMS consistency and Cassandra consistency are two different beasts. Just remember that write throughput will degrade the higher the consistency level you use. It also makes you less tolerant to network partitioning. 

> On the other hand, what will happen if I set it up as eventual consistent? Will the data become inconsistent after a crash/reboot, similar to the case of asynchronous replication? Is there an automated conflict resolution algorithm in Cassandra (which will likely cause data loss)? Or human intervention is needed?

Everything is eventually consistent in Cassandra. Period. You can read more about the ConsistencyLevel flag for writes/reads on the API wiki page[1]. Data on a single machine is usually not inconsistent as long as it's hit the commit log (Use a ConsistencyLevel = ONE to ensure that you're at least in the commit log on a single node). 

What happens on crash/reboot is (my understanding) that it replays the commit log. If you need it to be on 3 nodes it'll fire off background processes to fix said data (though I think it does this on read via read repair).

No human intervention is needed.

--Joe

[1] http://wiki.apache.org/cassandra/API


Re: Cassandra data loss

Posted by Mark Greene <gr...@gmail.com>.
Ryan King actually has a very nice, short and sweet explanation that cuts
through the FUD:

http://theryanking.com/entries/2010/04/29/potential-consistency/

On Mon, May 24, 2010 at 12:01 PM, Steve Lihn <st...@gmail.com> wrote:

> So if I set it up to be strongly consistent, I should have the same level
> of consistency as traditional relational DB ?
>
> On the other hand, what will happen if I set it up as eventual consistent?
> Will the data become inconsistent after a crash/reboot, similar to the case
> of asynchronous replication? Is there an automated conflict resolution
> algorithm in Cassandra (which will likely cause data loss)? Or human
> intervention is needed?
>
> Steve
>
>
> On Mon, May 24, 2010 at 11:51 AM, Joe Stump <jo...@joestump.net> wrote:
>
>> This is largely FUD. Cassandra let's you choose how consistent you want
>> writes to be. The more consistency you choose, the slower the writes, but
>> it's very unlikely with high consistency that you'll lose data.
>>
>> That being said, if you write with a consistency level of 0 then, yes, you
>> could lose data. Cassandra's consistency is much like root privileges on
>> Unix systems; it gives you more than enough rope to hang yourself if you so
>> choose to.
>>
>> --Joe
>>
>>
>> On May 24, 2010, at 9:47 AM, Steve Lihn wrote:
>>
>> > I am evaluating Cassandra as a candidate for our next-gen database. One
>> of my colleagues told me that "it's not recommended to use it as your system
>> of Record because it CAN lose data". Can someone with architecture
>> understanding shed some light on under what circumstance Cassandra cluster
>> can either lose data or become inconsistent ? (a node in a cluster crashes,
>> network partitions, I/O glitches, etc.)
>> >
>> > Thanks,
>> > Steve
>>
>>
>

Re: Cassandra data loss

Posted by Steve Lihn <st...@gmail.com>.
So if I set it up to be strongly consistent, I should have the same level of
consistency as traditional relational DB ?

On the other hand, what will happen if I set it up as eventual consistent?
Will the data become inconsistent after a crash/reboot, similar to the case
of asynchronous replication? Is there an automated conflict resolution
algorithm in Cassandra (which will likely cause data loss)? Or human
intervention is needed?

Steve

On Mon, May 24, 2010 at 11:51 AM, Joe Stump <jo...@joestump.net> wrote:

> This is largely FUD. Cassandra let's you choose how consistent you want
> writes to be. The more consistency you choose, the slower the writes, but
> it's very unlikely with high consistency that you'll lose data.
>
> That being said, if you write with a consistency level of 0 then, yes, you
> could lose data. Cassandra's consistency is much like root privileges on
> Unix systems; it gives you more than enough rope to hang yourself if you so
> choose to.
>
> --Joe
>
>
> On May 24, 2010, at 9:47 AM, Steve Lihn wrote:
>
> > I am evaluating Cassandra as a candidate for our next-gen database. One
> of my colleagues told me that "it's not recommended to use it as your system
> of Record because it CAN lose data". Can someone with architecture
> understanding shed some light on under what circumstance Cassandra cluster
> can either lose data or become inconsistent ? (a node in a cluster crashes,
> network partitions, I/O glitches, etc.)
> >
> > Thanks,
> > Steve
>
>

Re: Cassandra data loss

Posted by Joe Stump <jo...@joestump.net>.
This is largely FUD. Cassandra let's you choose how consistent you want writes to be. The more consistency you choose, the slower the writes, but it's very unlikely with high consistency that you'll lose data.

That being said, if you write with a consistency level of 0 then, yes, you could lose data. Cassandra's consistency is much like root privileges on Unix systems; it gives you more than enough rope to hang yourself if you so choose to.

--Joe


On May 24, 2010, at 9:47 AM, Steve Lihn wrote:

> I am evaluating Cassandra as a candidate for our next-gen database. One of my colleagues told me that "it's not recommended to use it as your system of Record because it CAN lose data". Can someone with architecture understanding shed some light on under what circumstance Cassandra cluster can either lose data or become inconsistent ? (a node in a cluster crashes, network partitions, I/O glitches, etc.)
> 
> Thanks,
> Steve