You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alex Durgin <al...@gmail.com> on 2010/03/20 19:09:19 UTC

data corruption experiences

Recent messages to the list regarding durabilty and backup strategies  
leads me to a few questions that other new users may also have.

What's the general experience with corruption to date?

Is it common?

Would I regret operating a single node cluster?

Digg referenced sending snapshots to hdfs. Is keeping snapshots local  
to each node in a cluster deployment with replicas a bad idea?

Re: data corruption experiences

Posted by Alex Durgin <al...@gmail.com>.

Thanks for the input.

My primary draw to Cassandra is dynamic schema. I could make it work  
relationally, perhaps even nicely with something like postgres'  
hstore, but I haven't investigated that fully yet. Relatively linear  
scaling has it's appeal and competitive advantages too. I also find  
the idea of lucandra for secondary indexes very appealing. Perhaps  
with drizzle integration coming down the road some of the nicities of  
sql can be enjoyed.

On Mar 20, 2010, at 1:16 PM, Chris Goffinet <go...@digg.com> wrote:

> We saw corruption pre 0.4 days. Digg hasn't seen corruption since  
> that got taken care of. We are only doing this for the "just in case  
> the shit hits the fan". Cassandra is rapidly changing and it would  
> be completely careless of us to forgo a path of using a new database  
> as our primary datastore. Our operations team has learned with years  
> of experience you never want to be in that situation, especially  
> with a new datastore.
>
> Single node cluster? Why Cassandra if you're plans are to just use 1  
> node? In that case, a simple RAID solution would work fine.
>
> -Chris
>
> On Mar 20, 2010, at 11:09 AM, Alex Durgin wrote:
>
>> Recent messages to the list regarding durabilty and backup  
>> strategies leads me to a few questions that other new users may  
>> also have.
>>
>> What's the general experience with corruption to date?
>>
>> Is it common?
>>
>> Would I regret operating a single node cluster?
>>
>> Digg referenced sending snapshots to hdfs. Is keeping snapshots  
>> local to each node in a cluster deployment with replicas a bad idea?
>>
>>
>

Re: data corruption experiences

Posted by Chris Goffinet <go...@digg.com>.

We saw corruption pre 0.4 days. Digg hasn't seen corruption since that got taken care of. We are only doing this for the "just in case the shit hits the fan". Cassandra is rapidly changing and it would be completely careless of us to forgo a path of using a new database as our primary datastore. Our operations team has learned with years of experience you never want to be in that situation, especially with a new datastore.

Single node cluster? Why Cassandra if you're plans are to just use 1 node? In that case, a simple RAID solution would work fine.

-Chris

On Mar 20, 2010, at 11:09 AM, Alex Durgin wrote:

> Recent messages to the list regarding durabilty and backup strategies leads me to a few questions that other new users may also have.
> 
> What's the general experience with corruption to date?
> 
> Is it common?
> 
> Would I regret operating a single node cluster?
> 
> Digg referenced sending snapshots to hdfs. Is keeping snapshots local to each node in a cluster deployment with replicas a bad idea?
> 
>