You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Russ Hatch (JIRA)" <ji...@apache.org> on 2014/01/08 22:22:50 UTC

[jira] [Commented] (CASSANDRA-5789) Data not fully replicated with 2 nodes and replication factor 2

    [ https://issues.apache.org/jira/browse/CASSANDRA-5789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865901#comment-13865901 ] 

Russ Hatch commented on CASSANDRA-5789:
---------------------------------------

I've not been able to consistently reproduce the reported issue on 1.2.6

Using the code sample provided earlier, I did see some NotFoundExceptions raised by pycassa but I was not 100% certain those were raised for rows that were successfully reported back as written (CL=1). I think there could be something unexpected happening in the sample code or pycassa.

I created an additional python test with cassandra-dtest and cassandra-dbapi2, and was not able to reproduce any similar issue. I used two nodes local to my machine, with a replication factor of two. Writes and Reads were done with CL=1. I used up to 40 threads making writes across the two nodes, and up to 125k records created in each keyspace (so 40 threads, each creating 125k rows at the same time). In some cases I exhausted the resources on my machine enough that cassandra did not respond to some write requests, but reading back was 100% successful for the rows actually written (checked a random sampling of 100k rows spanning all 40 keyspaces).

I will attach my python code to this ticket.

> Data not fully replicated with 2 nodes and replication factor 2
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-5789
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5789
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.2, 1.2.6
>         Environment: Official Datastax Cassandra 1.2.6, running on Linux RHEL 6.2.  I've seen the same behavior with Cassandra 1.2.2.
> Sun Java 1.7.0_10-b18 64-bit
> Java heap settings: -Xms8192M -Xmx8192M -Xmn2048M
>            Reporter: James Lee
>            Assignee: Russ Hatch
>         Attachments: CassBugRepro.py, CassTestData.py
>
>
> I'm seeing a problem with a 2-node Cassandra test deployment, where it seems that data isn't being replicated among the nodes as I would expect.
> The setup and test is as follows:
> - Two Cassandra nodes in the cluster (they each have themselves and the other node as seeds in cassandra.yaml).
> - Create 40 keyspaces, each with simple replication strategy and 
> replication factor 2.
> - Populate 125,000 rows into each keyspace, using a pycassa client with a connection pool pointed at both nodes.  These are populated with writes using consistency level of 1.
> - Wait until nodetool on each node reports that there are no hinted handoffs outstanding (see output below).
> - Do random reads of the rows in the keyspaces, again using a pycassa client with a connection pool pointed at both nodes.  These are read using consistency level 1.
> I'm finding that the vast majority of reads are successful, but a small 
> proportion (~0.1%) are returned as Not Found.  If I manually try to look up 
> those keys using cassandra-cli, I see that they are returned when querying one of the nodes, but not when querying the other.  So it seems like some of the rows have simply not been replicated, even though the write for these rows was reported to the client as successful.
> If I reduce the rate at which the test tool initially writes data into the database then I don't see any failed reads, so this seems like a load-related issue.  My understanding is that if all writes were successful and there are no pending hinted handoffs, then the data should be fully-replicated and reads should return it (even with read and write consistency of 1).
> Here's the output from notetool on the two nodes:
> comet-mvs01:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All time blocked
> ReadStage                         0         0              2         0                 0
> RequestResponseStage              0         0         878494         0                 0
> MutationStage                     0         0        2869107         0                 0
> ReadRepairStage                   0         0              0         0                 0
> ReplicateOnWriteStage             0         0              0         0                 0
> GossipStage                       0         0           2208         0                 0
> AntiEntropyStage                  0         0              0         0                 0
> MigrationStage                    0         0            994         0                 0
> MemtablePostFlusher               0         0           4399         0                 0
> FlushWriter                       0         0           2264         0               556
> MiscStage                         0         0              0         0                 0
> commitlog_archiver                0         0              0         0                 0
> InternalResponseStage             0         0            153         0                 0
> HintedHandoff                     0         0              2         0                 0
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR                  0
> BINARY                       0
> READ                         0
> MUTATION                 87655
> _TRACE                       0
> REQUEST_RESPONSE             0
> comet-mvs02:/dsc-cassandra-1.2.6# ./bin/nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All time blocked
> ReadStage                         0         0            868         0                 0
> RequestResponseStage              0         0        3919665         0                 0
> MutationStage                     0         0        8177325         0                 0
> ReadRepairStage                   0         0            113         0                 0
> ReplicateOnWriteStage             0         0              0         0                 0
> GossipStage                       0         0           9624         0                 0
> AntiEntropyStage                  0         0              0         0                 0
> MigrationStage                    0         0           2666         0                 0
> MemtablePostFlusher               0         0           7869         0                 0
> FlushWriter                       0         0           4273         0              1179
> MiscStage                         0         0              0         0                 0
> commitlog_archiver                0         0              0         0                 0
> InternalResponseStage             0         0            215         0                 0
> HintedHandoff                     0         0              8         0                 0
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR                  0
> BINARY                       0
> READ                         0
> MUTATION                531988
> _TRACE                       0
> REQUEST_RESPONSE             0



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)