You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Narendra Sharma (JIRA)" <ji...@apache.org> on 2011/04/20 02:38:05 UTC

[jira] [Commented] (CASSANDRA-2514) batch_mutate operations with CL=LOCAL_QUORUM throw TimeOutException when there aren't sufficient live nodes

    [ https://issues.apache.org/jira/browse/CASSANDRA-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021875#comment-13021875 ] 

Narendra Sharma commented on CASSANDRA-2514:
--------------------------------------------

I think the issue is because DatacenterWriteResponseHandler.assureSufficientLiveNodes is not checking for live nodes.

DatacenterWriteResponseHandler.assureSufficientLiveNodes works on writeEndpoints. writeEndpoints contains list of the all the endpoints (may be more if there are nodes bootstrapping).

I think either writeEndpoints should ignore dead/unreachable nodes or DatacenterWriteResponseHandler.assureSufficientLiveNodes should use hintedEndpoints.keySet() as that contains the live endpoints.
I compared the implementation with WriteResponseHandler.assureSufficientLiveNodes and found that it uses hintedEndpoints.


I am attaching the patch that works for me.

> batch_mutate operations with CL=LOCAL_QUORUM throw TimeOutException when there aren't sufficient live nodes
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2514
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2514
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>         Environment: 1. Cassandra 0.7.4 running on RHEL 5.5
> 2. 2 DC setup
> 3. RF = 4 (DC1 = 2, DC2 = 2)
> 4. CL = LOCAL_QUORUM
>            Reporter: Narendra Sharma
>             Fix For: 0.7.5
>
>
> We have a 2 DC setup with RF = 4. There are 2 nodes in each DC. Following is the keyspace definition:
> <snip>
> keyspaces:
>     - name: KeyspaceMetadata
>       replica_placement_strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>       strategy_options:
>         DC1 : 2
>         DC2 : 2
>       replication_factor: 4
> </snip>
> I shutdown all except one node and waited for the live node to recognize that other nodes are dead. Following is the nodetool ring output on the live node:
> Address         Status State   Load            Owns    Token                                       
>                                                        169579575332184635438912517119426957796     
> 10.17.221.19    Down   Normal  ?               29.20%  49117425183422571410176530597442406739      
> 10.17.221.17    Up     Normal  81.64 KB        4.41%   56615248844645582918169246064691229930      
> 10.16.80.54     Down   Normal  ?               21.13%  92563519227261352488017033924602789201      
> 10.17.221.18    Down   Normal  ?               45.27%  169579575332184635438912517119426957796     
> I expect UnavailableException when I send batch_mutate request to node that is up. However, it returned TimeOutException:
> TimedOutException()
>     at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:16493)
>     at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:916)
>     at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:890)
> Following is the cassandra-topology.properties
> # Cassandra Node IP=Data Center:Rack
> 10.17.221.17=DC1:RAC1
> 10.17.221.19=DC1:RAC2
> 10.17.221.18=DC2:RAC1
> 10.16.80.54=DC2:RAC2

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira