You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Paul Prescod <pa...@ayogo.com> on 2010/04/08 09:55:07 UTC

Write consistency

In this¹ debate, there seemed to be consensus on the following fact:

"In Cassandra, say you use N=3, W=3 & R=1. Let’s say you managed to
only write to replicas A & B, but not C. In this case Cassandra will
return an error to the application saying the write failed- which is
acceptable given than W=3. But Cassandra does not cleanup/rollback the
writes that happened to A & B."

If this is still true (even for ConsistencyLevel.ALL) then I would
like to add it to the API documentation. I'd also be curious about if
there have been discussions about for an optional 2PC mode for use on
fast LANs.

 Paul Prescod

¹ http://jsensarma.com/blog/2009/11/dynamo-part-i-a-followup-and-re-rebuttals/

Re: Write consistency

Posted by Avinash Lakshman <av...@gmail.com>.

Retry is the best option. Because the read repair will fix it on a
subsequent read and it will actually fix it with a value that was actually
deemed a failed write to the client.

Avinash

On Thu, Apr 8, 2010 at 9:47 AM, David Strauss <da...@fourkitchens.com>wrote:

> A read repair will fix it immediately after the first read of the row.
>
> On 2010-04-08 16:36, Mark Greene wrote:
> > So unless you re-try the write, the previous stale write stays on the
> > other two nodes? Would a read repair fix this eventually?
> >
> > On Thu, Apr 8, 2010 at 11:36 AM, Avinash Lakshman
> > <avinash.lakshman@gmail.com <ma...@gmail.com>> wrote:
> >
> >     What your describing is a distributed transaction? Generally strong
> >     consistency is always associated with doing transactional writes
> >     where you never see the results of a failed write on a subsequent
> >     read no matter what happens. Cassandra has no notion of rollback.
> >     That is why no combination will give you strong consistency. The
> >     idea is you re-try the failed write and eventually the system would
> >     have gotten rid of the previous stale write.
> >
> >     Avinash
> >
> >
> >     On Thu, Apr 8, 2010 at 8:29 AM, Jeremy Dunck <jdunck@gmail.com
> >     <ma...@gmail.com>> wrote:
> >
> >         On Thu, Apr 8, 2010 at 7:16 AM, Gary Dusbabek
> >         <gdusbabek@gmail.com <ma...@gmail.com>> wrote:
> >         > On Thu, Apr 8, 2010 at 02:55, Paul Prescod <paul@ayogo.com
> >         <ma...@ayogo.com>> wrote:
> >         >> In this¹ debate, there seemed to be consensus on the
> >         following fact:
> >         >>
> >         >> "In Cassandra, say you use N=3, W=3 & R=1. Let’s say you
> >         managed to
> >         >> only write to replicas A & B, but not C. In this case
> >         Cassandra will
> >         >> return an error to the application saying the write failed-
> >         which is
> >         >> acceptable given than W=3. But Cassandra does not
> >         cleanup/rollback the
> >         >> writes that happened to A & B."
> >         >>
> >         >
> >         > correct: no rolling back.  Cassandra does go out of its way to
> >         make
> >         > sure the cluster is healthy enough to begin the write though.
> >
> >         I think the general answer here is don't use R=1 if you can't
> >         tolerate
> >         inconsistency?  Still the point of confusion -- if W=3 and the
> write
> >         succeeds on 2 nodes but fails the 3rd, the write fails (to the
> >         updating client), but is the data on the two successful nodes
> still
> >         readable (i.e. reading from what was actually a failed write)?
> >
> >
> >
>
>
> --
> David Strauss
>   | david@fourkitchens.com
>   | +1 512 577 5827 [mobile]
> Four Kitchens
>   | http://fourkitchens.com
>   | +1 512 454 6659 [office]
>   | +1 512 870 8453 [direct]
>
>

Re: Write consistency

Posted by David Strauss <da...@fourkitchens.com>.

A read repair will fix it immediately after the first read of the row.

On 2010-04-08 16:36, Mark Greene wrote:
> So unless you re-try the write, the previous stale write stays on the
> other two nodes? Would a read repair fix this eventually?
> 
> On Thu, Apr 8, 2010 at 11:36 AM, Avinash Lakshman
> <avinash.lakshman@gmail.com <ma...@gmail.com>> wrote:
> 
>     What your describing is a distributed transaction? Generally strong
>     consistency is always associated with doing transactional writes
>     where you never see the results of a failed write on a subsequent
>     read no matter what happens. Cassandra has no notion of rollback.
>     That is why no combination will give you strong consistency. The
>     idea is you re-try the failed write and eventually the system would
>     have gotten rid of the previous stale write.
> 
>     Avinash
> 
> 
>     On Thu, Apr 8, 2010 at 8:29 AM, Jeremy Dunck <jdunck@gmail.com
>     <ma...@gmail.com>> wrote:
> 
>         On Thu, Apr 8, 2010 at 7:16 AM, Gary Dusbabek
>         <gdusbabek@gmail.com <ma...@gmail.com>> wrote:
>         > On Thu, Apr 8, 2010 at 02:55, Paul Prescod <paul@ayogo.com
>         <ma...@ayogo.com>> wrote:
>         >> In this¹ debate, there seemed to be consensus on the
>         following fact:
>         >>
>         >> "In Cassandra, say you use N=3, W=3 & R=1. Let’s say you
>         managed to
>         >> only write to replicas A & B, but not C. In this case
>         Cassandra will
>         >> return an error to the application saying the write failed-
>         which is
>         >> acceptable given than W=3. But Cassandra does not
>         cleanup/rollback the
>         >> writes that happened to A & B."
>         >>
>         >
>         > correct: no rolling back.  Cassandra does go out of its way to
>         make
>         > sure the cluster is healthy enough to begin the write though.
> 
>         I think the general answer here is don't use R=1 if you can't
>         tolerate
>         inconsistency?  Still the point of confusion -- if W=3 and the write
>         succeeds on 2 nodes but fails the 3rd, the write fails (to the
>         updating client), but is the data on the two successful nodes still
>         readable (i.e. reading from what was actually a failed write)?
> 
> 
> 


-- 
David Strauss
   | david@fourkitchens.com
   | +1 512 577 5827 [mobile]
Four Kitchens
   | http://fourkitchens.com
   | +1 512 454 6659 [office]
   | +1 512 870 8453 [direct]

Re: Write consistency

Posted by Benjamin Black <b...@b3k.us>.

Yes.  Or you would retry the write.  Either way, the system achieves
consistency eventually, hence the name.

On Thu, Apr 8, 2010 at 9:36 AM, Mark Greene <gr...@gmail.com> wrote:
> So unless you re-try the write, the previous stale write stays on the other
> two nodes? Would a read repair fix this eventually?
>

Re: Write consistency

Posted by Mark Greene <gr...@gmail.com>.

So unless you re-try the write, the previous stale write stays on the other
two nodes? Would a read repair fix this eventually?

On Thu, Apr 8, 2010 at 11:36 AM, Avinash Lakshman <
avinash.lakshman@gmail.com> wrote:

> What your describing is a distributed transaction? Generally strong
> consistency is always associated with doing transactional writes where you
> never see the results of a failed write on a subsequent read no matter what
> happens. Cassandra has no notion of rollback. That is why no combination
> will give you strong consistency. The idea is you re-try the failed write
> and eventually the system would have gotten rid of the previous stale write.
>
> Avinash
>
>
> On Thu, Apr 8, 2010 at 8:29 AM, Jeremy Dunck <jd...@gmail.com> wrote:
>
>> On Thu, Apr 8, 2010 at 7:16 AM, Gary Dusbabek <gd...@gmail.com>
>> wrote:
>> > On Thu, Apr 8, 2010 at 02:55, Paul Prescod <pa...@ayogo.com> wrote:
>> >> In this¹ debate, there seemed to be consensus on the following fact:
>> >>
>> >> "In Cassandra, say you use N=3, W=3 & R=1. Let’s say you managed to
>> >> only write to replicas A & B, but not C. In this case Cassandra will
>> >> return an error to the application saying the write failed- which is
>> >> acceptable given than W=3. But Cassandra does not cleanup/rollback the
>> >> writes that happened to A & B."
>> >>
>> >
>> > correct: no rolling back.  Cassandra does go out of its way to make
>> > sure the cluster is healthy enough to begin the write though.
>>
>> I think the general answer here is don't use R=1 if you can't tolerate
>> inconsistency?  Still the point of confusion -- if W=3 and the write
>> succeeds on 2 nodes but fails the 3rd, the write fails (to the
>> updating client), but is the data on the two successful nodes still
>> readable (i.e. reading from what was actually a failed write)?
>>
>
>

Re: Write consistency

Posted by Avinash Lakshman <av...@gmail.com>.

What your describing is a distributed transaction? Generally strong
consistency is always associated with doing transactional writes where you
never see the results of a failed write on a subsequent read no matter what
happens. Cassandra has no notion of rollback. That is why no combination
will give you strong consistency. The idea is you re-try the failed write
and eventually the system would have gotten rid of the previous stale write.

Avinash

On Thu, Apr 8, 2010 at 8:29 AM, Jeremy Dunck <jd...@gmail.com> wrote:

> On Thu, Apr 8, 2010 at 7:16 AM, Gary Dusbabek <gd...@gmail.com> wrote:
> > On Thu, Apr 8, 2010 at 02:55, Paul Prescod <pa...@ayogo.com> wrote:
> >> In this¹ debate, there seemed to be consensus on the following fact:
> >>
> >> "In Cassandra, say you use N=3, W=3 & R=1. Let’s say you managed to
> >> only write to replicas A & B, but not C. In this case Cassandra will
> >> return an error to the application saying the write failed- which is
> >> acceptable given than W=3. But Cassandra does not cleanup/rollback the
> >> writes that happened to A & B."
> >>
> >
> > correct: no rolling back.  Cassandra does go out of its way to make
> > sure the cluster is healthy enough to begin the write though.
>
> I think the general answer here is don't use R=1 if you can't tolerate
> inconsistency?  Still the point of confusion -- if W=3 and the write
> succeeds on 2 nodes but fails the 3rd, the write fails (to the
> updating client), but is the data on the two successful nodes still
> readable (i.e. reading from what was actually a failed write)?
>

Re: Write consistency

Posted by Jeremy Dunck <jd...@gmail.com>.

On Thu, Apr 8, 2010 at 7:16 AM, Gary Dusbabek <gd...@gmail.com> wrote:
> On Thu, Apr 8, 2010 at 02:55, Paul Prescod <pa...@ayogo.com> wrote:
>> In this¹ debate, there seemed to be consensus on the following fact:
>>
>> "In Cassandra, say you use N=3, W=3 & R=1. Let’s say you managed to
>> only write to replicas A & B, but not C. In this case Cassandra will
>> return an error to the application saying the write failed- which is
>> acceptable given than W=3. But Cassandra does not cleanup/rollback the
>> writes that happened to A & B."
>>
>
> correct: no rolling back.  Cassandra does go out of its way to make
> sure the cluster is healthy enough to begin the write though.

I think the general answer here is don't use R=1 if you can't tolerate
inconsistency?  Still the point of confusion -- if W=3 and the write
succeeds on 2 nodes but fails the 3rd, the write fails (to the
updating client), but is the data on the two successful nodes still
readable (i.e. reading from what was actually a failed write)?

Re: Write consistency

Posted by Gary Dusbabek <gd...@gmail.com>.

On Thu, Apr 8, 2010 at 02:55, Paul Prescod <pa...@ayogo.com> wrote:
> In this¹ debate, there seemed to be consensus on the following fact:
>
> "In Cassandra, say you use N=3, W=3 & R=1. Let’s say you managed to
> only write to replicas A & B, but not C. In this case Cassandra will
> return an error to the application saying the write failed- which is
> acceptable given than W=3. But Cassandra does not cleanup/rollback the
> writes that happened to A & B."
>

correct: no rolling back.  Cassandra does go out of its way to make
sure the cluster is healthy enough to begin the write though.

> If this is still true (even for ConsistencyLevel.ALL) then I would
> like to add it to the API documentation. I'd also be curious about if
> there have been discussions about for an optional 2PC mode for use on
> fast LANs.
>

None recently.  I think 2PC would be at odds with Cassandra's goal to
"keep writes really fast".  That being said, we have zookeeper support
in contrib which would get you most of the way there.  My zookeeper
knowledge is pretty weak though, but I don't think it would let you
rollback if things went south during the actual commit.

Gary

>  Paul Prescod
>
> ¹ http://jsensarma.com/blog/2009/11/dynamo-part-i-a-followup-and-re-rebuttals/
>

Re: Write consistency

Posted by Benjamin Black <b...@b3k.us>.

On Thu, Apr 8, 2010 at 12:55 AM, Paul Prescod <pa...@ayogo.com> wrote:
>
> ¹ http://jsensarma.com/blog/2009/11/dynamo-part-i-a-followup-and-re-rebuttals/
>

Pay no attention to this disingenuous troll.


b

Re: Write consistency

Posted by Benjamin Black <b...@b3k.us>.

His arguments consistently (hah!) boil down to this: if you
misconfigure things for your intended application, you get undesirable
behavior.  For example, the correct approach to the situation cited is
to use quorum reads and writes.  W=3/R=1/N=3 might be appropriate for
situations in which you want to force writes to a remote datacenter
(using appropriate placement strategy), but are not concerned with
clients always seeing the same data at any given instant.

b

On Thu, Apr 8, 2010 at 12:55 AM, Paul Prescod <pa...@ayogo.com> wrote:
> In this¹ debate, there seemed to be consensus on the following fact:
>
> "In Cassandra, say you use N=3, W=3 & R=1. Let’s say you managed to
> only write to replicas A & B, but not C. In this case Cassandra will
> return an error to the application saying the write failed- which is
> acceptable given than W=3. But Cassandra does not cleanup/rollback the
> writes that happened to A & B."
>
> If this is still true (even for ConsistencyLevel.ALL) then I would
> like to add it to the API documentation. I'd also be curious about if
> there have been discussions about for an optional 2PC mode for use on
> fast LANs.
>
>  Paul Prescod
>
> ¹ http://jsensarma.com/blog/2009/11/dynamo-part-i-a-followup-and-re-rebuttals/
>