You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Віталій Тимчишин <ti...@gmail.com> on 2012/06/07 15:20:53 UTC

Failing operations & repair

Hello.

I am making some cassandra presentations in Kyiv and would like to check
that I am telling people truth :)
Could community tell me if next points are true:
1) Failed (from client-side view) operation may still be applied to cluster
2) Coordinator does not try anything to "roll-back" operation that failed
because it was processed by less then consitency level number of nodes.
3) Hinted handoff works only for successfull operations.
4) Counters are not reliable because of (1)
5) Read-repair may help to propagate operation that was failed it's
consistency level, but was persisted to some nodes.
6) Manual repair is still needed because of (2) and (3)

P.S. If some points apply only to some cassandra versions, I will be happy
to know this too.
-- 
Best regards,
 Vitalii Tymchyshyn

Re: Failing operations & repair

Posted by Vitalii Tymchyshyn <ti...@gmail.com>.
Hello.

For sure. Here they are: 
http://www.slideshare.net/vittim1/practical-cassandra
Slides are in english.
I've presented this presentation some time ago at JEEConf and once more 
yesterday in local developers club.
There should be video recording (russian) available somewhen, but it's 
not here yet.

Best regards, Vitalii Tymchyshyn

13.06.12 02:27, crypto five ???????(??):
> It would be really great to look at your slides. Do you have any plans 
> to share your presentation?
>
> On Sat, Jun 9, 2012 at 1:14 AM, ??????? ???????? <tivv00@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Thanks a lot. I was not sure if coordinator somehow tries to
>     "roll-back" transactions that failed to reach it's consistency level.
>     (Yet I could not imagine a method to do this, without 2-phase
>     commit :) )
>
>
>     2012/6/8 aaron morton <aaron@thelastpickle.com
>     <ma...@thelastpickle.com>>
>
>>         I am making some cassandra presentations in Kyiv and would
>>         like to check that I am telling people truth :)
>         Thanks for spreading the word :)
>
>>         1) Failed (from client-side view) operation may still be
>>         applied to cluster
>         Yes.
>         If you fail with UnavailableException it's because from the
>         coordinators view of the cluster there is less than CL nodes
>         available. So retry. Somewhat similar story with
>         TimedOutException.
>
>>         2) Coordinator does not try anything to "roll-back" operation
>>         that failed because it was processed by less then consitency
>>         level number of nodes.
>         Correct.
>
>>         3) Hinted handoff works only for successfull operations.
>         HH will be stored if the coordinator proceeds with the request.
>         In 1.X HH is stored on the coordinator if a replica is down
>         when the request starts and if the node does not reply in
>         rpc_timeout.
>
>>         4) Counters are not reliable because of (1)
>         If you get a TimedOutException when writing a counter you
>         should not re-send the request.
>
>>         5) Read-repair may help to propagate operation that was
>>         failed it's consistency level, but was persisted to some nodes.
>         Yes. It works in the background, by default is only enabled on
>         10% of requests.
>         Note that RR is not the same as the Consistent Level for read.
>         If you work as a CL > ONE the results from CL nodes are always
>         compared and differences resolved. RR is concerned with the
>         replicas not involved in the CL read.
>
>>         6) Manual repair is still needed because of (2) and (3)
>         Manual repair is *the* was to achieve consistency of data on
>         disk. HH and RR are optimisations designed to reduce the
>         chance of a Digest Mismatch during a read with CL > ONE.
>         It is also essential for distributing Tombstones before they
>         are purged by compaction.
>>         P.S. If some points apply only to some cassandra versions, I
>>         will be happy to know this too.
>         Assume everyone for version 1.X
>
>         Thanks
>
>         -----------------
>         Aaron Morton
>         Freelance Developer
>         @aaronmorton
>         http://www.thelastpickle.com
>
>         On 8/06/2012, at 1:20 AM, ??????? ???????? wrote:
>
>>         Hello.
>>
>>         I am making some cassandra presentations in Kyiv and would
>>         like to check that I am telling people truth :)
>>         Could community tell me if next points are true:
>>         1) Failed (from client-side view) operation may still be
>>         applied to cluster
>>         2) Coordinator does not try anything to "roll-back" operation
>>         that failed because it was processed by less then consitency
>>         level number of nodes.
>>         3) Hinted handoff works only for successfull operations.
>>         4) Counters are not reliable because of (1)
>>         5) Read-repair may help to propagate operation that was
>>         failed it's consistency level, but was persisted to some nodes.
>>         6) Manual repair is still needed because of (2) and (3)
>>
>>         P.S. If some points apply only to some cassandra versions, I
>>         will be happy to know this too.
>>         -- 
>>         Best regards,
>>          Vitalii Tymchyshyn
>
>
>
>
>     -- 
>     Best regards,
>      Vitalii Tymchyshyn
>
>


Re: Failing operations & repair

Posted by crypto five <cr...@gmail.com>.
It would be really great to look at your slides. Do you have any plans to
share your presentation?

On Sat, Jun 9, 2012 at 1:14 AM, Віталій Тимчишин <ti...@gmail.com> wrote:

> Thanks a lot. I was not sure if coordinator somehow tries to "roll-back"
> transactions that failed to reach it's consistency level.
> (Yet I could not imagine a method to do this, without 2-phase commit :) )
>
>
> 2012/6/8 aaron morton <aa...@thelastpickle.com>
>
>> I am making some cassandra presentations in Kyiv and would like to check
>> that I am telling people truth :)
>>
>> Thanks for spreading the word :)
>>
>> 1) Failed (from client-side view) operation may still be applied to
>> cluster
>>
>> Yes.
>> If you fail with UnavailableException it's because from the coordinators
>> view of the cluster there is less than CL nodes available. So retry.
>> Somewhat similar story with TimedOutException.
>>
>> 2) Coordinator does not try anything to "roll-back" operation that failed
>> because it was processed by less then consitency level number of nodes.
>>
>> Correct.
>>
>> 3) Hinted handoff works only for successfull operations.
>>
>> HH will be stored if the coordinator proceeds with the request.
>> In 1.X HH is stored on the coordinator if a replica is down when the
>> request starts and if the node does not reply in rpc_timeout.
>>
>> 4) Counters are not reliable because of (1)
>>
>> If you get a TimedOutException when writing a counter you should not
>> re-send the request.
>>
>> 5) Read-repair may help to propagate operation that was failed it's
>> consistency level, but was persisted to some nodes.
>>
>> Yes. It works in the background, by default is only enabled on 10% of
>> requests.
>> Note that RR is not the same as the Consistent Level for read. If you
>> work as a CL > ONE the results from CL nodes are always compared and
>> differences resolved. RR is concerned with the replicas not involved in the
>> CL read.
>>
>> 6) Manual repair is still needed because of (2) and (3)
>>
>> Manual repair is *the* was to achieve consistency of data on disk. HH and
>> RR are optimisations designed to reduce the chance of a Digest Mismatch
>> during a read with CL > ONE.
>> It is also essential for distributing Tombstones before they are purged
>> by compaction.
>>
>> P.S. If some points apply only to some cassandra versions, I will be
>> happy to know this too.
>>
>> Assume everyone for version 1.X
>>
>> Thanks
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 8/06/2012, at 1:20 AM, Віталій Тимчишин wrote:
>>
>> Hello.
>>
>> I am making some cassandra presentations in Kyiv and would like to check
>> that I am telling people truth :)
>> Could community tell me if next points are true:
>> 1) Failed (from client-side view) operation may still be applied to
>> cluster
>> 2) Coordinator does not try anything to "roll-back" operation that failed
>> because it was processed by less then consitency level number of nodes.
>> 3) Hinted handoff works only for successfull operations.
>> 4) Counters are not reliable because of (1)
>> 5) Read-repair may help to propagate operation that was failed it's
>> consistency level, but was persisted to some nodes.
>> 6) Manual repair is still needed because of (2) and (3)
>>
>> P.S. If some points apply only to some cassandra versions, I will be
>> happy to know this too.
>> --
>> Best regards,
>>  Vitalii Tymchyshyn
>>
>>
>>
>
>
> --
> Best regards,
>  Vitalii Tymchyshyn
>

Re: Failing operations & repair

Posted by Віталій Тимчишин <ti...@gmail.com>.
Thanks a lot. I was not sure if coordinator somehow tries to "roll-back"
transactions that failed to reach it's consistency level.
(Yet I could not imagine a method to do this, without 2-phase commit :) )

2012/6/8 aaron morton <aa...@thelastpickle.com>

> I am making some cassandra presentations in Kyiv and would like to check
> that I am telling people truth :)
>
> Thanks for spreading the word :)
>
> 1) Failed (from client-side view) operation may still be applied to cluster
>
> Yes.
> If you fail with UnavailableException it's because from the coordinators
> view of the cluster there is less than CL nodes available. So retry.
> Somewhat similar story with TimedOutException.
>
> 2) Coordinator does not try anything to "roll-back" operation that failed
> because it was processed by less then consitency level number of nodes.
>
> Correct.
>
> 3) Hinted handoff works only for successfull operations.
>
> HH will be stored if the coordinator proceeds with the request.
> In 1.X HH is stored on the coordinator if a replica is down when the
> request starts and if the node does not reply in rpc_timeout.
>
> 4) Counters are not reliable because of (1)
>
> If you get a TimedOutException when writing a counter you should not
> re-send the request.
>
> 5) Read-repair may help to propagate operation that was failed it's
> consistency level, but was persisted to some nodes.
>
> Yes. It works in the background, by default is only enabled on 10% of
> requests.
> Note that RR is not the same as the Consistent Level for read. If you work
> as a CL > ONE the results from CL nodes are always compared and differences
> resolved. RR is concerned with the replicas not involved in the CL read.
>
> 6) Manual repair is still needed because of (2) and (3)
>
> Manual repair is *the* was to achieve consistency of data on disk. HH and
> RR are optimisations designed to reduce the chance of a Digest Mismatch
> during a read with CL > ONE.
> It is also essential for distributing Tombstones before they are purged by
> compaction.
>
> P.S. If some points apply only to some cassandra versions, I will be happy
> to know this too.
>
> Assume everyone for version 1.X
>
> Thanks
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 8/06/2012, at 1:20 AM, Віталій Тимчишин wrote:
>
> Hello.
>
> I am making some cassandra presentations in Kyiv and would like to check
> that I am telling people truth :)
> Could community tell me if next points are true:
> 1) Failed (from client-side view) operation may still be applied to cluster
> 2) Coordinator does not try anything to "roll-back" operation that failed
> because it was processed by less then consitency level number of nodes.
> 3) Hinted handoff works only for successfull operations.
> 4) Counters are not reliable because of (1)
> 5) Read-repair may help to propagate operation that was failed it's
> consistency level, but was persisted to some nodes.
> 6) Manual repair is still needed because of (2) and (3)
>
> P.S. If some points apply only to some cassandra versions, I will be happy
> to know this too.
> --
> Best regards,
>  Vitalii Tymchyshyn
>
>
>


-- 
Best regards,
 Vitalii Tymchyshyn

Re: Failing operations & repair

Posted by aaron morton <aa...@thelastpickle.com>.
> I am making some cassandra presentations in Kyiv and would like to check that I am telling people truth :)
Thanks for spreading the word :)

> 1) Failed (from client-side view) operation may still be applied to cluster

Yes. 
If you fail with UnavailableException it's because from the coordinators view of the cluster there is less than CL nodes available. So retry. Somewhat similar story with TimedOutException. 

> 2) Coordinator does not try anything to "roll-back" operation that failed because it was processed by less then consitency level number of nodes.

Correct.

> 3) Hinted handoff works only for successfull operations.

HH will be stored if the coordinator proceeds with the request.
In 1.X HH is stored on the coordinator if a replica is down when the request starts and if the node does not reply in rpc_timeout. 

> 4) Counters are not reliable because of (1)

If you get a TimedOutException when writing a counter you should not re-send the request. 

> 5) Read-repair may help to propagate operation that was failed it's consistency level, but was persisted to some nodes.

Yes. It works in the background, by default is only enabled on 10% of requests. 
Note that RR is not the same as the Consistent Level for read. If you work as a CL > ONE the results from CL nodes are always compared and differences resolved. RR is concerned with the replicas not involved in the CL read. 

> 6) Manual repair is still needed because of (2) and (3)

Manual repair is *the* was to achieve consistency of data on disk. HH and RR are optimisations designed to reduce the chance of a Digest Mismatch during a read with CL > ONE. 
It is also essential for distributing Tombstones before they are purged by compaction.
> P.S. If some points apply only to some cassandra versions, I will be happy to know this too.

Assume everyone for version 1.X

Thanks

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 8/06/2012, at 1:20 AM, Віталій Тимчишин wrote:

> Hello.
> 
> I am making some cassandra presentations in Kyiv and would like to check that I am telling people truth :)
> Could community tell me if next points are true:
> 1) Failed (from client-side view) operation may still be applied to cluster
> 2) Coordinator does not try anything to "roll-back" operation that failed because it was processed by less then consitency level number of nodes.
> 3) Hinted handoff works only for successfull operations.
> 4) Counters are not reliable because of (1)
> 5) Read-repair may help to propagate operation that was failed it's consistency level, but was persisted to some nodes.
> 6) Manual repair is still needed because of (2) and (3)
> 
> P.S. If some points apply only to some cassandra versions, I will be happy to know this too.
> -- 
> Best regards,
>  Vitalii Tymchyshyn