You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Narendra Sharma (JIRA)" <ji...@apache.org> on 2011/04/20 02:38:05 UTC
[jira] [Commented] (CASSANDRA-2514) batch_mutate operations with
CL=LOCAL_QUORUM throw TimeOutException when there aren't sufficient live
nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-2514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021875#comment-13021875 ]
Narendra Sharma commented on CASSANDRA-2514:
--------------------------------------------
I think the issue is because DatacenterWriteResponseHandler.assureSufficientLiveNodes is not checking for live nodes.
DatacenterWriteResponseHandler.assureSufficientLiveNodes works on writeEndpoints. writeEndpoints contains list of the all the endpoints (may be more if there are nodes bootstrapping).
I think either writeEndpoints should ignore dead/unreachable nodes or DatacenterWriteResponseHandler.assureSufficientLiveNodes should use hintedEndpoints.keySet() as that contains the live endpoints.
I compared the implementation with WriteResponseHandler.assureSufficientLiveNodes and found that it uses hintedEndpoints.
I am attaching the patch that works for me.
> batch_mutate operations with CL=LOCAL_QUORUM throw TimeOutException when there aren't sufficient live nodes
> -----------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-2514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2514
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 0.7.4
> Environment: 1. Cassandra 0.7.4 running on RHEL 5.5
> 2. 2 DC setup
> 3. RF = 4 (DC1 = 2, DC2 = 2)
> 4. CL = LOCAL_QUORUM
> Reporter: Narendra Sharma
> Fix For: 0.7.5
>
>
> We have a 2 DC setup with RF = 4. There are 2 nodes in each DC. Following is the keyspace definition:
> <snip>
> keyspaces:
> - name: KeyspaceMetadata
> replica_placement_strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
> strategy_options:
> DC1 : 2
> DC2 : 2
> replication_factor: 4
> </snip>
> I shutdown all except one node and waited for the live node to recognize that other nodes are dead. Following is the nodetool ring output on the live node:
> Address Status State Load Owns Token
> 169579575332184635438912517119426957796
> 10.17.221.19 Down Normal ? 29.20% 49117425183422571410176530597442406739
> 10.17.221.17 Up Normal 81.64 KB 4.41% 56615248844645582918169246064691229930
> 10.16.80.54 Down Normal ? 21.13% 92563519227261352488017033924602789201
> 10.17.221.18 Down Normal ? 45.27% 169579575332184635438912517119426957796
> I expect UnavailableException when I send batch_mutate request to node that is up. However, it returned TimeOutException:
> TimedOutException()
> at org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:16493)
> at org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:916)
> at org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:890)
> Following is the cassandra-topology.properties
> # Cassandra Node IP=Data Center:Rack
> 10.17.221.17=DC1:RAC1
> 10.17.221.19=DC1:RAC2
> 10.17.221.18=DC2:RAC1
> 10.16.80.54=DC2:RAC2
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira