You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2013/01/04 18:28:12 UTC

[jira] [Commented] (CASSANDRA-5113) RepairCallback breaks CL guarantees

    [ https://issues.apache.org/jira/browse/CASSANDRA-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544044#comment-13544044 ] 

Jonathan Ellis commented on CASSANDRA-5113:
-------------------------------------------

+1 on the fix.

-0 on the rename; the only time we do full-data reads is for repairs so I don't really see an improvement there.

-1 on committing to 1.1; this behavior has been present since at least 1.0 and probably longer, and changes to core StorageProxy code like this have a long track record of causing subtle regressions.
                
> RepairCallback breaks CL guarantees
> -----------------------------------
>
>                 Key: CASSANDRA-5113
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5113
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 1.1.9
>
>         Attachments: 0001-Always-ensure-CL-after-a-digest-mismatch.txt, 0002-Rename-RowRepairResolver-to-RowDataResolver.txt
>
>
> RepairCallback does not validate the consistency level of the query. It seems that this was done on purpose as the comments there says:
> {noformat}
> /**
>  * The main difference between this and ReadCallback is, ReadCallback has a ConsistencyLevel
>  * it needs to achieve.  Repair on the other hand is happy to repair whoever replies within the timeout.
>  */
> {noformat}
> Concretely, the get() method of RepairCallback:
> * waits for all endpoints, even if there is more than strictly required by the CL.
> * if it timeouts, doesn't check it and always return a response.
> * for some reason, it returns null unless there is strictly more than 1 response.
> All of that seems wrong to me. The result of RepairCallback is what is returned to the client in case of a digest mismatch on the first read. So we must ensure that the CL has been reached. Also, returning null where there is 1 response (or none) seems clearly wrong.
> In fact I don't think we need a special callback for this "read all data" phase as it is a "normal" read (the fact we do a first read with digests is just an "optimization"). The only difference between the two phases should be how we resolve the responses (in the first case we have digest and in the 2nd we don't) but that's handled by the resolver.
> So attaching a patch that removes RepairCallback and use ReadCallback instead.  I'm also attaching a 2nd trivial patch that renames RowRepairResolver to RowDataResolver because I think it describe better what this actually do (i.e.  the main goal is to resolve a full data read to answer the right value to the client, repairing inconsistent nodes is secondary).
> The patch is against 1.1, because I think breaking CL guarantees is probably serious enough to warrant pushing this to 1.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira