You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Kamal Bahadur <ma...@gmail.com> on 2011/12/28 23:05:53 UTC

Consistency Level

Hi All,

My Cassandra cluster has 4 nodes with a RF of 2. I am trying to verify if
my data gets replicated to 2 nodes with the write consistency level of ONE.
All the tests that I have done so far tells me that the data is not getting
replicated for some reason.

I executed the getendpoints command to find out the node where my record
lives and tried to keep those two nodes running and brought down other
nodes down. When I tried to read the record using hector, I am getting this
exception "May not be enough replicas present to handle consistency level"

I tried to read data using cassandra-cli but I am getting "null".

I ran a manual repair command, but still getting the same exception. I
noticed that as soon as the number of active nodes becomes less than 3 I
get this exception.

With consistency level ONE, I would assume that with just one node up and
running (of course the one that has the data) I should get my data back.
But this is not happening.

Will the read repair happen automatically even if I read and write using
the consistency level ONE?

Any help will be much appreciated.

Thanks,
Kamal

Environment details:

Cluster: *4 nodes*
RF: *2*
Hector: *1.0-1*
Cassandra: *0.8.6*
Read CL: *ONE*
Write CL: *ONE*

Output of describe keyspace:

Keyspace: MyKF:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
    Options: [replication_factor:2]
  Column Families:
    ColumnFamily: MyCF
      Key Validation Class: org.apache.cassandra.db.marshal.BytesType
      Default column value validator:
org.apache.cassandra.db.marshal.BytesType
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
      Row cache size / save period in seconds: 0.0/0
      Key cache size / save period in seconds: 200000.0/14400
      Memtable thresholds: 0.5812499999999999/1440/124 (millions of
ops/minutes/MB)
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: true
      Built indexes: [MyCF.MyCF_appId_idx, MyCF.MyCF_bcId_idx]
      Column Metadata:
        Column Name: .appId
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type
          Index Name: MyCF_appId_idx
          Index Type: KEYS
        Column Name: .bcId
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type
          Index Name: MyCF_bcId_idx
          Index Type: KEYS

My hector code:

public class MyHectorTemplate extends HectorTemplateImpl { public void init
() { CassandraHostConfigurator cassandraHostConfigurator = new
CassandraHostConfigurator( servers); cassandraHostConfigurator .
setLoadBalancingPolicy(new LeastActiveBalancingPolicy()); ThriftCluster
cluster = new ThriftCluster( configuration.getString(
"cassandra_cluster_name"), cassandraHostConfigurator); setCluster(cluster);
setKeyspaceName(configuration.getString("cassandra_keyspace_name"));
ConfigurableConsistencyLevel configurableConsistencyLevelPolicy = new
ConfigurableConsistencyLevel(); configurableConsistencyLevelPolicy .
setDefaultReadConsistencyLevel(HConsistencyLevel.ONE);
setConfigurableConsistencyLevelPolicy(configurableConsistencyLevelPolicy);
super.init(); logger.info("------------------------------------------->" +
getConfigurableConsistencyLevelPolicy().get( OperationType.READ).toString
()); } } Here is the log on one of the nodes: DEBUG [pool-2-thread-2] 2011-
12-28 09:33:47,631 ClientState.java (line 87) logged in: #<User allow_all
groups=[]> DEBUG [pool-2-thread-2] 2011-12-28 09:33:47,651 CassandraServer.
java (line 670) scan DEBUG [pool-2-thread-2] 2011-12-28 09:33:47,665
StorageProxy.java (line 889) restricted ranges for query [-1,-1] are [[-1,0
], (0,42535295865117307932921825928971026432], (
42535295865117307932921825928971026432,
85070591730234615865843651857942052864], (
85070591730234615865843651857942052864,
127605887595351923798765477786913079296], (
127605887595351923798765477786913079296,-1]] DEBUG [pool-2-thread-2] 2011-12
-28 09:33:47,666 StorageProxy.java (line 976) scan ranges are [-1,0],(0,
42535295865117307932921825928971026432],(
42535295865117307932921825928971026432,
85070591730234615865843651857942052864],(
85070591730234615865843651857942052864,
127605887595351923798765477786913079296],(
127605887595351923798765477786913079296,-1] DEBUG [pool-2-thread-2] 2011-12-
28 09:33:47,679 ReadCallback.java (line 76) Blockfor/repair is 1/false;
setting up requests to DEBUG [pool-2-thread-2] 2011-12-28 09:33:47,679
ReadCallback.java (line 203) Live nodes do not satisfy ConsistencyLevel (1
required)

Re: Consistency Level

Posted by Peter Schuller <pe...@infidyne.com>.

> exception "May not be enough replicas present to handle consistency level"

Check for mistakes in using getendpoints. Cassandra says Unavailable
when there is not enough replicas *IN THE REPLICA SET FOR THE ROW KEY*
to satisfy the consistency level.

> I tried to read data using cassandra-cli but I am getting "null".

This is just cassandra-cli quirkyness IIRC; I think you get "null" on
exceptions.

> With consistency level ONE, I would assume that with just one node up and
> running (of course the one that has the data) I should get my data back. But
> this is not happening.

1 node among the ones in the replica set of your row has to be up.

> Will the read repair happen automatically even if I read and write using the
> consistency level ONE?

Yes, assuming it's turned on.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Consistency Level

Posted by Kamal Bahadur <ma...@gmail.com>.

This is what I was trying to understand. Thanks for your time Aaron! I
could not find any documentation on the web that explains the secondary
index like how you have done. Great job!

On Thu, Jan 5, 2012 at 3:18 AM, aaron morton <aa...@thelastpickle.com>wrote:

> I missed a ! in the code :) The query will break the token ring into
> ranges based on the node tokens and then find the UP nodes for each range.
>
> I've taken another walk through the code, the logs helped.
>
> In short, you do not have enough UP nodes to support an indexed get at CL
> ONE. It is working by design and you *should* have gotten an
> UnavailableException returned. There must be CL up replicas for each token
> range. In your test node 200.190 is down and so the next node, with RF 2
> this means there are no replicas for the range. The log line below is
> logged just before the UnavalableException is raised
>
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line
> 203) Live nodes  do not satisfy ConsistencyLevel (1 required)
>
>
> You will need at least every RF'th node UP. Another way to look at is if
> you have RF contiguous nodes DOWN you cannot perform an indexed get.
>
> If you are interested this is what the logs are saying…
>
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line
> 976) scan ranges are
> [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
>
> There are 4 token ranges to query, i.e. we have to make 4 reads to query
> over the whole cluster.
>
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line
> 76) Blockfor/repair is 1/false; setting up requests to /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line
> 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /
> 172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line
> 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /
> 172.16.202.118
>
> Starting to read for the first token range. A bug in 0.8.6 makes it read
> from 202.118 when it does not need to.
>
>
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
> 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
> 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
> 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line
> 123) collecting 0 of 100:
> 7a7a32323636373030303438303031:false:0@1325704860925009
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line
> 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
> [7a7a32323636373030303438303031:false:0@1325704860925009,])
>
> Scanned the secondary index on 200.130 and found an entry for the row
> key 7a7a32323636373030303438303031 matched the index expression.
>
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java
> (line 46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130
>
> Returning ZERO rows for the query result. Because the row key we read
> above has the token 111413491371349413596553235966977111575L which is not
> in the first token range from
> above (0,42535295865117307932921825928971026432] and this is the range we
> are interested in now.
>
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line
> 76) Blockfor/repair is 1/false; setting up requests to /172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line
> 1003) reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /
> 172.16.202.118
>
> Processing the second range now. There is only one node up for this range,
> 202.118
>
> DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913
> ResponseVerbHandler.java (line 48) Processing response on a callback from
> 172@/172.16.202.118
> DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913
> ResponseVerbHandler.java (line 48) Processing response on a callback from
> 173@/172.16.202.118
>
> Got the callback from 202.118 for both the query ranges.
>
> The logs on 202.118 show the same local local query. But I'm a little
> confused as to why the row exists on node 2 at all.
>
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line
> 76) Blockfor/repair is 1/false; setting up requests to
>
> Moving on, time to process the third token range
> (85070591730234615865843651857942052864,127605887595351923798765477786913079296]
>
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line
> 203) Live nodes  do not satisfy ConsistencyLevel (1 required)
>
> Oh noes there are no nodes available for that token range. Throw
> UnavailableException
>
> Hope that helps.
>
>
>   -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/01/2012, at 10:52 AM, Kamal Bahadur wrote:
>
> Hi Aaron,
>
> Thanks for your response!
>
> I re-ran the test case # 5. (Node 1 & 2 running, Node 3 & 4 down, Node 1
> contains the data, CL ONE and RF 2). I was connected to Node 1 while I ran
> the test. I still did not get any data. See below logs:
>
> -----------------------------------------------------------
> nodetool ring:
> -----------------------------------------------------------
>
> Address         DC          Rack        Status State   Load
> Owns    Token
>
> 127605887595351923798765477786913079296
> xxx.16.200.130  datacenter1 rack1       Up     Normal  584.97 KB
> 25.00%  0
> xxx.16.202.118  datacenter1 rack1       Up     Normal  570.91 KB
> 25.00%  42535295865117307932921825928971026432
> xxx.16.200.190  datacenter1 rack1       Down   Normal  593.51 KB
> 25.00%  85070591730234615865843651857942052864
> xxx.16.200.217  datacenter1 rack1       Down   Normal  617.23 KB
> 25.00%  127605887595351923798765477786913079296
>
> -----------------------------------------------------
> Logs on Node 1
> -----------------------------------------------------
> DEBUG [Thread-7] 2012-01-04 13:44:00,665 IncomingTcpConnection.java (line
> 148) Version is now 2
>  INFO [HintedHandoff:1] 2012-01-04 13:44:00,850 HintedHandOffManager.java
> (line 314) Endpoint /172.16.200.190 died before hint delivery, aborting
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java
> (line 303) Checking remote schema before delivering hints
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java
> (line 295) schema for /172.16.200.217 matches local schema
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java
> (line 309) Sleeping 17296ms to stagger hint delivery
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,857 CassandraServer.java
> (line 670) scan
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line
> 889) restricted ranges for query [-1,-1] are [[-1,0],
> (0,42535295865117307932921825928971026432],
> (42535295865117307932921825928971026432,85070591730234615865843651857942052864],
> (85070591730234615865843651857942052864,127605887595351923798765477786913079296],
> (127605887595351923798765477786913079296,-1]]
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line
> 976) scan ranges are
> [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
> DEBUG [Thread-7] 2012-01-04 13:44:00,876 IncomingTcpConnection.java (line
> 148) Version is now 2
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line
> 76) Blockfor/repair is 1/false; setting up requests to /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line
> 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /
> 172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line
> 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /
> 172.16.202.118
> DEBUG [WRITE-/172.16.202.118] 2012-01-04 13:44:00,884
> OutboundTcpConnection.java (line 162) attempting to connect to /
> 172.16.202.118
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
> 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
> 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
> 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line
> 123) collecting 0 of 100:
> 7a7a32323636373030303438303031:false:0@1325704860925009
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line
> 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
> [7a7a32323636373030303438303031:false:0@1325704860925009,])
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java
> (line 46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130
> DEBUG [RequestResponseStage:1] 2012-01-04 13:44:00,895
> ResponseVerbHandler.java (line 48) Processing response on a callback from
> 171@/172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line
> 76) Blockfor/repair is 1/false; setting up requests to /172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line
> 1003) reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /
> 172.16.202.118
> DEBUG [Thread-7] 2012-01-04 13:44:00,912 IncomingTcpConnection.java (line
> 148) Version is now 2
> DEBUG [Thread-7] 2012-01-04 13:44:00,913 IncomingTcpConnection.java (line
> 148) Version is now 2
> DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913
> ResponseVerbHandler.java (line 48) Processing response on a callback from
> 172@/172.16.202.118
> DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913
> ResponseVerbHandler.java (line 48) Processing response on a callback from
> 173@/172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line
> 76) Blockfor/repair is 1/false; setting up requests to
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line
> 203) Live nodes  do not satisfy ConsistencyLevel (1 required)
> -----------------------------------------------------
> Logs on Node 2
> -----------------------------------------------------
> DEBUG [Thread-6] 2012-01-04 13:44:00,886 Gossiper.java (line 197) Setting
> version 2 for /172.16.200.130
> DEBUG [Thread-6] 2012-01-04 13:44:00,886 IncomingTcpConnection.java (line
> 105) set version for /172.16.200.130 to 2
> DEBUG [Thread-6] 2012-01-04 13:44:00,895 IncomingTcpConnection.java (line
> 148) Version is now 2
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
> 1550) Primary scan clause is member
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
> 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
> 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
> 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
> 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,907 ColumnFamilyStore.java (line
> 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,909 SliceQueryFilter.java (line
> 123) collecting 0 of 100:
> 7a7a32323636373030303438303031:false:0@1325615757889011
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,910 ColumnFamilyStore.java (line
> 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
> [7a7a32323636373030303438303031:false:0@1325615757889011,])
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,910 SliceQueryFilter.java (line
> 123) collecting 0 of 100:
> 7a7a32323636373030303438303031:false:0@1325615757889011
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 ColumnFamilyStore.java (line
> 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
> [7a7a32323636373030303438303031:false:0@1325615757889011,])
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java
> (line 46) Sending RangeSliceReply{rows=} to 172@/172.16.200.130
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java
> (line 46) Sending RangeSliceReply{rows=} to 173@/172.16.200.130
> DEBUG [Thread-4] 2012-01-04 13:44:01,664 IncomingTcpConnection.java (line
> 148) Version is now 2
>
>
>
> On Wed, Jan 4, 2012 at 1:15 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> I've not spent much time with the secondary indexes, so a couple of
>> questions.
>>
>> Whats is the output of nodetool ring ?
>> Which node were you connected to when you did the get ?
>> If you enable DEBUG logging what do the log messages from StorageProxy
>> say that contain the string "scan ranges are" and "reading .. from ..."
>>
>> Now for the wild guessing….It's working as designed for a CL ONE
>> request. Looking at test case 5 and *assuming* you were connected to node 2
>> this is what I think is happening: A get indexed slice query without a
>> start key does not know which nodes will contain the data. Reading the code
>> it will consider the replicas for the minimum token as nodes to send the
>> query to, for a CL ONE query it will only use one. If you were connected to
>> node 2 the query would have executed only on node 2.
>>
>> This is where I get confused. What happens if you have 50 nodes, with RF
>> 3, and you execute a get_indexed_slice at QUOURM with no start_key and the
>> only rows that satisfy the query exist on nodes 47, 48 and 49. So they are
>> a long way away from the minimum token, assuming SimpleStrategy and well
>> ordered token ring.
>>
>> I think I've missed something, anyone ?
>> Cheers
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 4/01/2012, at 9:44 AM, Kamal Bahadur wrote:
>>
>> Hi Peter,
>>
>> To test, I wiped out all the data from Cassandra and inserted just one
>> record. The row key is "7a7a32323636373030303438303031". I used
>> getendpoints to see where my data is and double checked the same using
>> sstable2json command.
>>
>> Since the RF is 2, the data is currently on Node 1 and Node 4 of my 4
>> nodes cluster. I used cassandra-cli to query the data by using one of the
>> secondary index but following are my results:
>>
>>   Test Node 1 Node 2 Node 3 Node 4 Got data back?  1 Up Up Up Up Yes  2
>> Up Up Up
>> Yes  3 Up Up
>> Up Yes  4
>> Up Up Up Yes  5 Up Up
>>
>> No  6
>>
>> Up Up No  7 Up
>>
>> Up No
>>
>> It turns out that even though my consistency level is ONE, since I am
>> using secondary index to query the data, at least 3 nodes has to be
>> running. And out of these 3 running nodes, it works even if one nodes
>> contains the data.
>>
>> Somewhere in the mailing I read that "Iterating through all of the rows
>> matching an index clause on your cluster is guaranteed to touch N/RF of the
>> nodes in your cluster, because each node only knows about data that is
>> indexed locally."
>>
>> I am not sure what N/RF means in my case. Does it mean 4/2 = 2? where 4
>> is the number of nodes and 2 is the RF. If it is 2, why is it not returning
>> any data when the two nodes that contains the data is running (test #7)?
>>
>> For my use case, I have to have a RF of 2 and should be able to query
>> using secondary index with a CL of ONE. Is this possible when 2 nodes are
>> down in a 4 nodes cluster? Is there any limitations on using secondary
>> index?
>>
>> Thanks in advance.
>>
>> Thanks,
>> Kamal
>>
>> On Thu, Dec 29, 2011 at 6:40 PM, Peter Schuller <
>> peter.schuller@infidyne.com> wrote:
>>
>>> > Thanks for the response Peter! I checked everything and it look good
>>> to me.
>>> >
>>> > I am stuck with this for almost 2 days now. Has anyone had this issue?
>>>
>>> While it is certainly possible that you're running into a bug, it
>>> seems unlikely to me since it is the kind of bug that would affect
>>> almost anyone if it is failing with Unavailable due to unrelated (not
>>> in replica sets) nodes being down.
>>>
>>> Can you please post back with (1) the ring layout ('nodetool ring'),
>>> and (2) the exact row key that you're testing with?
>>>
>>> You might also want to run with DEBUG level (modify
>>> log4j-server.properties at the top) and the strategy (assuming you are
>>> using NetworkTopologyStrategy) will log selected endpoints, and
>>> confirm that it's indeed picking endpoints that you think it should
>>> based on getendpoints.
>>>
>>> --
>>> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>>>
>>
>>
>>
>
>

Re: Consistency Level

Posted by aaron morton <aa...@thelastpickle.com>.

I missed a ! in the code :) The query will break the token ring into ranges based on the node tokens and then find the UP nodes for each range. 

I've taken another walk through the code, the logs helped. 

In short, you do not have enough UP nodes to support an indexed get at CL ONE. It is working by design and you *should* have gotten an UnavailableException returned. There must be CL up replicas for each token range. In your test node 200.190 is down and so the next node, with RF 2 this means there are no replicas for the range. The log line below is logged just before the UnavalableException is raised

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) Live nodes  do not satisfy ConsistencyLevel (1 required)


You will need at least every RF'th node UP. Another way to look at is if you have RF contiguous nodes DOWN you cannot perform an indexed get. 

If you are interested this is what the logs are saying…

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 976) scan ranges are [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
There are 4 token ranges to query, i.e. we have to make 4 reads to query over the whole cluster. 

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line 76) Blockfor/repair is 1/false; setting up requests to /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.202.118
Starting to read for the first token range. A bug in 0.8.6 makes it read from 202.118 when it does not need to. 


> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line 123) collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325704860925009
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx [7a7a32323636373030303438303031:false:0@1325704860925009,])
Scanned the secondary index on 200.130 and found an entry for the row key 7a7a32323636373030303438303031 matched the index expression. 

> DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130
Returning ZERO rows for the query result. Because the row key we read above has the token 111413491371349413596553235966977111575L which is not in the first token range from above (0,42535295865117307932921825928971026432] and this is the range we are interested in now. 

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line 76) Blockfor/repair is 1/false; setting up requests to /172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line 1003) reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /172.16.202.118
Processing the second range now. There is only one node up for this range, 202.118

> DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913 ResponseVerbHandler.java (line 48) Processing response on a callback from 172@/172.16.202.118
> DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913 ResponseVerbHandler.java (line 48) Processing response on a callback from 173@/172.16.202.118
Got the callback from 202.118 for both the query ranges. 

The logs on 202.118 show the same local local query. But I'm a little confused as to why the row exists on node 2 at all.

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 76) Blockfor/repair is 1/false; setting up requests to
Moving on, time to process the third token range (85070591730234615865843651857942052864,127605887595351923798765477786913079296]

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) Live nodes  do not satisfy ConsistencyLevel (1 required)

Oh noes there are no nodes available for that token range. Throw UnavailableException

Hope that helps. 


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/01/2012, at 10:52 AM, Kamal Bahadur wrote:

> Hi Aaron, 
> 
> Thanks for your response!
> 
> I re-ran the test case # 5. (Node 1 & 2 running, Node 3 & 4 down, Node 1 contains the data, CL ONE and RF 2). I was connected to Node 1 while I ran the test. I still did not get any data. See below logs:
> 
> -----------------------------------------------------------
> nodetool ring:
> -----------------------------------------------------------
> 
> Address         DC          Rack        Status State   Load            Owns    Token                                 
>                                                                                127605887595351923798765477786913079296
> xxx.16.200.130  datacenter1 rack1       Up     Normal  584.97 KB       25.00%  0                                     
> xxx.16.202.118  datacenter1 rack1       Up     Normal  570.91 KB       25.00%  42535295865117307932921825928971026432
> xxx.16.200.190  datacenter1 rack1       Down   Normal  593.51 KB       25.00%  85070591730234615865843651857942052864
> xxx.16.200.217  datacenter1 rack1       Down   Normal  617.23 KB       25.00%  127605887595351923798765477786913079296
> 
> -----------------------------------------------------
> Logs on Node 1
> -----------------------------------------------------
> DEBUG [Thread-7] 2012-01-04 13:44:00,665 IncomingTcpConnection.java (line 148) Version is now 2
>  INFO [HintedHandoff:1] 2012-01-04 13:44:00,850 HintedHandOffManager.java (line 314) Endpoint /172.16.200.190 died before hint delivery, aborting
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java (line 303) Checking remote schema before delivering hints
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java (line 295) schema for /172.16.200.217 matches local schema
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java (line 309) Sleeping 17296ms to stagger hint delivery
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,857 CassandraServer.java (line 670) scan
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 889) restricted ranges for query [-1,-1] are [[-1,0], (0,42535295865117307932921825928971026432], (42535295865117307932921825928971026432,85070591730234615865843651857942052864], (85070591730234615865843651857942052864,127605887595351923798765477786913079296], (127605887595351923798765477786913079296,-1]]
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 976) scan ranges are [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
> DEBUG [Thread-7] 2012-01-04 13:44:00,876 IncomingTcpConnection.java (line 148) Version is now 2
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line 76) Blockfor/repair is 1/false; setting up requests to /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.202.118
> DEBUG [WRITE-/172.16.202.118] 2012-01-04 13:44:00,884 OutboundTcpConnection.java (line 162) attempting to connect to /172.16.202.118
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line 123) collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325704860925009
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx [7a7a32323636373030303438303031:false:0@1325704860925009,])
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130
> DEBUG [RequestResponseStage:1] 2012-01-04 13:44:00,895 ResponseVerbHandler.java (line 48) Processing response on a callback from 171@/172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line 76) Blockfor/repair is 1/false; setting up requests to /172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line 1003) reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /172.16.202.118
> DEBUG [Thread-7] 2012-01-04 13:44:00,912 IncomingTcpConnection.java (line 148) Version is now 2
> DEBUG [Thread-7] 2012-01-04 13:44:00,913 IncomingTcpConnection.java (line 148) Version is now 2
> DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913 ResponseVerbHandler.java (line 48) Processing response on a callback from 172@/172.16.202.118
> DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913 ResponseVerbHandler.java (line 48) Processing response on a callback from 173@/172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 76) Blockfor/repair is 1/false; setting up requests to
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) Live nodes  do not satisfy ConsistencyLevel (1 required)
> -----------------------------------------------------
> Logs on Node 2
> -----------------------------------------------------
> DEBUG [Thread-6] 2012-01-04 13:44:00,886 Gossiper.java (line 197) Setting version 2 for /172.16.200.130
> DEBUG [Thread-6] 2012-01-04 13:44:00,886 IncomingTcpConnection.java (line 105) set version for /172.16.200.130 to 2
> DEBUG [Thread-6] 2012-01-04 13:44:00,895 IncomingTcpConnection.java (line 148) Version is now 2
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1550) Primary scan clause is member
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,907 ColumnFamilyStore.java (line 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,909 SliceQueryFilter.java (line 123) collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325615757889011
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,910 ColumnFamilyStore.java (line 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx [7a7a32323636373030303438303031:false:0@1325615757889011,])
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,910 SliceQueryFilter.java (line 123) collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325615757889011
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 ColumnFamilyStore.java (line 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx [7a7a32323636373030303438303031:false:0@1325615757889011,])
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 172@/172.16.200.130
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 173@/172.16.200.130
> DEBUG [Thread-4] 2012-01-04 13:44:01,664 IncomingTcpConnection.java (line 148) Version is now 2
> 
> 
> 
> On Wed, Jan 4, 2012 at 1:15 AM, aaron morton <aa...@thelastpickle.com> wrote:
> I've not spent much time with the secondary indexes, so a couple of questions. 
> 
> Whats is the output of nodetool ring ? 
> Which node were you connected to when you did the get ?
> If you enable DEBUG logging what do the log messages from StorageProxy say that contain the string "scan ranges are" and "reading .. from ..."
>  
> Now for the wild guessing….It's working as designed for a CL ONE request. Looking at test case 5 and *assuming* you were connected to node 2 this is what I think is happening: A get indexed slice query without a start key does not know which nodes will contain the data. Reading the code it will consider the replicas for the minimum token as nodes to send the query to, for a CL ONE query it will only use one. If you were connected to node 2 the query would have executed only on node 2. 
> 
> This is where I get confused. What happens if you have 50 nodes, with RF 3, and you execute a get_indexed_slice at QUOURM with no start_key and the only rows that satisfy the query exist on nodes 47, 48 and 49. So they are a long way away from the minimum token, assuming SimpleStrategy and well ordered token ring. 
> 
> I think I've missed something, anyone ?
> Cheers
>  
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 4/01/2012, at 9:44 AM, Kamal Bahadur wrote:
> 
>> Hi Peter,
>> 
>> To test, I wiped out all the data from Cassandra and inserted just one record. The row key is "7a7a32323636373030303438303031". I used getendpoints to see where my data is and double checked the same using sstable2json command.
>> 
>> Since the RF is 2, the data is currently on Node 1 and Node 4 of my 4 nodes cluster. I used cassandra-cli to query the data by using one of the secondary index but following are my results:
>> 
>> Test	 Node 1	 Node 2	 Node 3	 Node 4	 Got data back?
>> 1	Up	Up	Up	Up	Yes
>> 2	Up	Up	Up	
>> Yes
>> 3	Up	Up	
>> Up	Yes
>> 4	
>> Up	Up	Up	Yes
>> 5	Up	Up	
>> 
>> No
>> 6	
>> 
>> Up	Up	No
>> 7	Up	
>> 
>> Up	No
>> 
>> 
>> It turns out that even though my consistency level is ONE, since I am using secondary index to query the data, at least 3 nodes has to be running. And out of these 3 running nodes, it works even if one nodes contains the data.
>> 
>> Somewhere in the mailing I read that "Iterating through all of the rows matching an index clause on your cluster is guaranteed to touch N/RF of the nodes in your cluster, because each node only knows about data that is indexed locally."
>> 
>> I am not sure what N/RF means in my case. Does it mean 4/2 = 2? where 4 is the number of nodes and 2 is the RF. If it is 2, why is it not returning any data when the two nodes that contains the data is running (test #7)?
>> 
>> For my use case, I have to have a RF of 2 and should be able to query using secondary index with a CL of ONE. Is this possible when 2 nodes are down in a 4 nodes cluster? Is there any limitations on using secondary index?
>> 
>> Thanks in advance.
>> 
>> Thanks,
>> Kamal
>> 
>> On Thu, Dec 29, 2011 at 6:40 PM, Peter Schuller <pe...@infidyne.com> wrote:
>> > Thanks for the response Peter! I checked everything and it look good to me.
>> >
>> > I am stuck with this for almost 2 days now. Has anyone had this issue?
>> 
>> While it is certainly possible that you're running into a bug, it
>> seems unlikely to me since it is the kind of bug that would affect
>> almost anyone if it is failing with Unavailable due to unrelated (not
>> in replica sets) nodes being down.
>> 
>> Can you please post back with (1) the ring layout ('nodetool ring'),
>> and (2) the exact row key that you're testing with?
>> 
>> You might also want to run with DEBUG level (modify
>> log4j-server.properties at the top) and the strategy (assuming you are
>> using NetworkTopologyStrategy) will log selected endpoints, and
>> confirm that it's indeed picking endpoints that you think it should
>> based on getendpoints.
>> 
>> --
>> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>> 
> 
>

Re: Consistency Level

Posted by Kamal Bahadur <ma...@gmail.com>.

Hi Aaron,

Thanks for your response!

I re-ran the test case # 5. (Node 1 & 2 running, Node 3 & 4 down, Node 1
contains the data, CL ONE and RF 2). I was connected to Node 1 while I ran
the test. I still did not get any data. See below logs:

-----------------------------------------------------------
nodetool ring:
-----------------------------------------------------------

Address         DC          Rack        Status State   Load
Owns    Token

127605887595351923798765477786913079296
xxx.16.200.130  datacenter1 rack1       Up     Normal  584.97 KB
25.00%  0
xxx.16.202.118  datacenter1 rack1       Up     Normal  570.91 KB
25.00%  42535295865117307932921825928971026432
xxx.16.200.190  datacenter1 rack1       Down   Normal  593.51 KB
25.00%  85070591730234615865843651857942052864
xxx.16.200.217  datacenter1 rack1       Down   Normal  617.23 KB
25.00%  127605887595351923798765477786913079296

-----------------------------------------------------
Logs on Node 1
-----------------------------------------------------
DEBUG [Thread-7] 2012-01-04 13:44:00,665 IncomingTcpConnection.java (line
148) Version is now 2
 INFO [HintedHandoff:1] 2012-01-04 13:44:00,850 HintedHandOffManager.java
(line 314) Endpoint /172.16.200.190 died before hint delivery, aborting
DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java
(line 303) Checking remote schema before delivering hints
DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java
(line 295) schema for /172.16.200.217 matches local schema
DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java
(line 309) Sleeping 17296ms to stagger hint delivery
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,857 CassandraServer.java (line
670) scan
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line
889) restricted ranges for query [-1,-1] are [[-1,0],
(0,42535295865117307932921825928971026432],
(42535295865117307932921825928971026432,85070591730234615865843651857942052864],
(85070591730234615865843651857942052864,127605887595351923798765477786913079296],
(127605887595351923798765477786913079296,-1]]
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line
976) scan ranges are
[-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
DEBUG [Thread-7] 2012-01-04 13:44:00,876 IncomingTcpConnection.java (line
148) Version is now 2
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line 76)
Blockfor/repair is 1/false; setting up requests to /172.16.200.130
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line
1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /
172.16.200.130
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line
1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /
172.16.202.118
DEBUG [WRITE-/172.16.202.118] 2012-01-04 13:44:00,884
OutboundTcpConnection.java (line 162) attempting to connect to /
172.16.202.118
DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
1550) Primary scan clause is member
DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
1563) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
1605) Scanning index 'Audit_Log.member EQ kamal' starting with
DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line
123) collecting 0 of 100:
7a7a32323636373030303438303031:false:0@1325704860925009
DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line
1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
[7a7a32323636373030303438303031:false:0@1325704860925009,])
DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java (line
46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130
DEBUG [RequestResponseStage:1] 2012-01-04 13:44:00,895
ResponseVerbHandler.java (line 48) Processing response on a callback from
171@/172.16.200.130
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line 76)
Blockfor/repair is 1/false; setting up requests to /172.16.202.118
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line
1003) reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /
172.16.202.118
DEBUG [Thread-7] 2012-01-04 13:44:00,912 IncomingTcpConnection.java (line
148) Version is now 2
DEBUG [Thread-7] 2012-01-04 13:44:00,913 IncomingTcpConnection.java (line
148) Version is now 2
DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913
ResponseVerbHandler.java (line 48) Processing response on a callback from
172@/172.16.202.118
DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913
ResponseVerbHandler.java (line 48) Processing response on a callback from
173@/172.16.202.118
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 76)
Blockfor/repair is 1/false; setting up requests to
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line
203) Live nodes  do not satisfy ConsistencyLevel (1 required)
-----------------------------------------------------
Logs on Node 2
-----------------------------------------------------
DEBUG [Thread-6] 2012-01-04 13:44:00,886 Gossiper.java (line 197) Setting
version 2 for /172.16.200.130
DEBUG [Thread-6] 2012-01-04 13:44:00,886 IncomingTcpConnection.java (line
105) set version for /172.16.200.130 to 2
DEBUG [Thread-6] 2012-01-04 13:44:00,895 IncomingTcpConnection.java (line
148) Version is now 2
DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
1550) Primary scan clause is member
DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
1550) Primary scan clause is member
DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
1563) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
1563) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
1605) Scanning index 'Audit_Log.member EQ kamal' starting with
DEBUG [ReadStage:3] 2012-01-04 13:44:00,907 ColumnFamilyStore.java (line
1605) Scanning index 'Audit_Log.member EQ kamal' starting with
DEBUG [ReadStage:2] 2012-01-04 13:44:00,909 SliceQueryFilter.java (line
123) collecting 0 of 100:
7a7a32323636373030303438303031:false:0@1325615757889011
DEBUG [ReadStage:2] 2012-01-04 13:44:00,910 ColumnFamilyStore.java (line
1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
[7a7a32323636373030303438303031:false:0@1325615757889011,])
DEBUG [ReadStage:3] 2012-01-04 13:44:00,910 SliceQueryFilter.java (line
123) collecting 0 of 100:
7a7a32323636373030303438303031:false:0@1325615757889011
DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 ColumnFamilyStore.java (line
1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
[7a7a32323636373030303438303031:false:0@1325615757889011,])
DEBUG [ReadStage:2] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java (line
46) Sending RangeSliceReply{rows=} to 172@/172.16.200.130
DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java (line
46) Sending RangeSliceReply{rows=} to 173@/172.16.200.130
DEBUG [Thread-4] 2012-01-04 13:44:01,664 IncomingTcpConnection.java (line
148) Version is now 2



On Wed, Jan 4, 2012 at 1:15 AM, aaron morton <aa...@thelastpickle.com>wrote:

> I've not spent much time with the secondary indexes, so a couple of
> questions.
>
> Whats is the output of nodetool ring ?
> Which node were you connected to when you did the get ?
> If you enable DEBUG logging what do the log messages from StorageProxy say
> that contain the string "scan ranges are" and "reading .. from ..."
>
> Now for the wild guessing….It's working as designed for a CL ONE
> request. Looking at test case 5 and *assuming* you were connected to node 2
> this is what I think is happening: A get indexed slice query without a
> start key does not know which nodes will contain the data. Reading the code
> it will consider the replicas for the minimum token as nodes to send the
> query to, for a CL ONE query it will only use one. If you were connected to
> node 2 the query would have executed only on node 2.
>
> This is where I get confused. What happens if you have 50 nodes, with RF
> 3, and you execute a get_indexed_slice at QUOURM with no start_key and the
> only rows that satisfy the query exist on nodes 47, 48 and 49. So they are
> a long way away from the minimum token, assuming SimpleStrategy and well
> ordered token ring.
>
> I think I've missed something, anyone ?
> Cheers
>
>   -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/01/2012, at 9:44 AM, Kamal Bahadur wrote:
>
> Hi Peter,
>
> To test, I wiped out all the data from Cassandra and inserted just one
> record. The row key is "7a7a32323636373030303438303031". I used
> getendpoints to see where my data is and double checked the same using
> sstable2json command.
>
> Since the RF is 2, the data is currently on Node 1 and Node 4 of my 4
> nodes cluster. I used cassandra-cli to query the data by using one of the
> secondary index but following are my results:
>
>   Test Node 1 Node 2 Node 3 Node 4 Got data back?  1 Up Up Up Up Yes  2 Up
> Up Up
> Yes  3 Up Up
> Up Yes  4
> Up Up Up Yes  5 Up Up
>
> No  6
>
> Up Up No  7 Up
>
> Up No
>
> It turns out that even though my consistency level is ONE, since I am
> using secondary index to query the data, at least 3 nodes has to be
> running. And out of these 3 running nodes, it works even if one nodes
> contains the data.
>
> Somewhere in the mailing I read that "Iterating through all of the rows
> matching an index clause on your cluster is guaranteed to touch N/RF of the
> nodes in your cluster, because each node only knows about data that is
> indexed locally."
>
> I am not sure what N/RF means in my case. Does it mean 4/2 = 2? where 4 is
> the number of nodes and 2 is the RF. If it is 2, why is it not returning
> any data when the two nodes that contains the data is running (test #7)?
>
> For my use case, I have to have a RF of 2 and should be able to query
> using secondary index with a CL of ONE. Is this possible when 2 nodes are
> down in a 4 nodes cluster? Is there any limitations on using secondary
> index?
>
> Thanks in advance.
>
> Thanks,
> Kamal
>
> On Thu, Dec 29, 2011 at 6:40 PM, Peter Schuller <
> peter.schuller@infidyne.com> wrote:
>
>> > Thanks for the response Peter! I checked everything and it look good to
>> me.
>> >
>> > I am stuck with this for almost 2 days now. Has anyone had this issue?
>>
>> While it is certainly possible that you're running into a bug, it
>> seems unlikely to me since it is the kind of bug that would affect
>> almost anyone if it is failing with Unavailable due to unrelated (not
>> in replica sets) nodes being down.
>>
>> Can you please post back with (1) the ring layout ('nodetool ring'),
>> and (2) the exact row key that you're testing with?
>>
>> You might also want to run with DEBUG level (modify
>> log4j-server.properties at the top) and the strategy (assuming you are
>> using NetworkTopologyStrategy) will log selected endpoints, and
>> confirm that it's indeed picking endpoints that you think it should
>> based on getendpoints.
>>
>> --
>> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>>
>
>
>

Re: Consistency Level

Posted by aaron morton <aa...@thelastpickle.com>.

I've not spent much time with the secondary indexes, so a couple of questions. 

Whats is the output of nodetool ring ? 
Which node were you connected to when you did the get ?
If you enable DEBUG logging what do the log messages from StorageProxy say that contain the string "scan ranges are" and "reading .. from ..."

Now for the wild guessing….It's working as designed for a CL ONE request. Looking at test case 5 and *assuming* you were connected to node 2 this is what I think is happening: A get indexed slice query without a start key does not know which nodes will contain the data. Reading the code it will consider the replicas for the minimum token as nodes to send the query to, for a CL ONE query it will only use one. If you were connected to node 2 the query would have executed only on node 2. 

This is where I get confused. What happens if you have 50 nodes, with RF 3, and you execute a get_indexed_slice at QUOURM with no start_key and the only rows that satisfy the query exist on nodes 47, 48 and 49. So they are a long way away from the minimum token, assuming SimpleStrategy and well ordered token ring. 

I think I've missed something, anyone ?
Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/01/2012, at 9:44 AM, Kamal Bahadur wrote:

> Hi Peter,
> 
> To test, I wiped out all the data from Cassandra and inserted just one record. The row key is "7a7a32323636373030303438303031". I used getendpoints to see where my data is and double checked the same using sstable2json command.
> 
> Since the RF is 2, the data is currently on Node 1 and Node 4 of my 4 nodes cluster. I used cassandra-cli to query the data by using one of the secondary index but following are my results:
> 
> Test	 Node 1	 Node 2	 Node 3	 Node 4	 Got data back?
> 1	Up	Up	Up	Up	Yes
> 2	Up	Up	Up	
> Yes
> 3	Up	Up	
> Up	Yes
> 4	
> Up	Up	Up	Yes
> 5	Up	Up	
> 
> No
> 6	
> 
> Up	Up	No
> 7	Up	
> 
> Up	No
> 
> 
> It turns out that even though my consistency level is ONE, since I am using secondary index to query the data, at least 3 nodes has to be running. And out of these 3 running nodes, it works even if one nodes contains the data.
> 
> Somewhere in the mailing I read that "Iterating through all of the rows matching an index clause on your cluster is guaranteed to touch N/RF of the nodes in your cluster, because each node only knows about data that is indexed locally."
> 
> I am not sure what N/RF means in my case. Does it mean 4/2 = 2? where 4 is the number of nodes and 2 is the RF. If it is 2, why is it not returning any data when the two nodes that contains the data is running (test #7)?
> 
> For my use case, I have to have a RF of 2 and should be able to query using secondary index with a CL of ONE. Is this possible when 2 nodes are down in a 4 nodes cluster? Is there any limitations on using secondary index?
> 
> Thanks in advance.
> 
> Thanks,
> Kamal
> 
> On Thu, Dec 29, 2011 at 6:40 PM, Peter Schuller <pe...@infidyne.com> wrote:
> > Thanks for the response Peter! I checked everything and it look good to me.
> >
> > I am stuck with this for almost 2 days now. Has anyone had this issue?
> 
> While it is certainly possible that you're running into a bug, it
> seems unlikely to me since it is the kind of bug that would affect
> almost anyone if it is failing with Unavailable due to unrelated (not
> in replica sets) nodes being down.
> 
> Can you please post back with (1) the ring layout ('nodetool ring'),
> and (2) the exact row key that you're testing with?
> 
> You might also want to run with DEBUG level (modify
> log4j-server.properties at the top) and the strategy (assuming you are
> using NetworkTopologyStrategy) will log selected endpoints, and
> confirm that it's indeed picking endpoints that you think it should
> based on getendpoints.
> 
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>

Re: Consistency Level

Posted by Kamal Bahadur <ma...@gmail.com>.

Hi Peter,

To test, I wiped out all the data from Cassandra and inserted just one
record. The row key is "7a7a32323636373030303438303031". I used
getendpoints to see where my data is and double checked the same using
sstable2json command.

Since the RF is 2, the data is currently on Node 1 and Node 4 of my 4 nodes
cluster. I used cassandra-cli to query the data by using one of the
secondary index but following are my results:

  Test Node 1 Node 2 Node 3 Node 4 Got data back?  1 Up Up Up Up Yes  2 Up
Up Up
Yes  3 Up Up
Up Yes  4
Up Up Up Yes  5 Up Up

No  6

Up Up No  7 Up

Up No

It turns out that even though my consistency level is ONE, since I am using
secondary index to query the data, at least 3 nodes has to be running. And
out of these 3 running nodes, it works even if one nodes contains the data.

Somewhere in the mailing I read that "Iterating through all of the rows
matching an index clause on your cluster is guaranteed to touch N/RF of the
nodes in your cluster, because each node only knows about data that is
indexed locally."

I am not sure what N/RF means in my case. Does it mean 4/2 = 2? where 4 is
the number of nodes and 2 is the RF. If it is 2, why is it not returning
any data when the two nodes that contains the data is running (test #7)?

For my use case, I have to have a RF of 2 and should be able to query using
secondary index with a CL of ONE. Is this possible when 2 nodes are down in
a 4 nodes cluster? Is there any limitations on using secondary index?

Thanks in advance.

Thanks,
Kamal

On Thu, Dec 29, 2011 at 6:40 PM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > Thanks for the response Peter! I checked everything and it look good to
> me.
> >
> > I am stuck with this for almost 2 days now. Has anyone had this issue?
>
> While it is certainly possible that you're running into a bug, it
> seems unlikely to me since it is the kind of bug that would affect
> almost anyone if it is failing with Unavailable due to unrelated (not
> in replica sets) nodes being down.
>
> Can you please post back with (1) the ring layout ('nodetool ring'),
> and (2) the exact row key that you're testing with?
>
> You might also want to run with DEBUG level (modify
> log4j-server.properties at the top) and the strategy (assuming you are
> using NetworkTopologyStrategy) will log selected endpoints, and
> confirm that it's indeed picking endpoints that you think it should
> based on getendpoints.
>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>

Re: Consistency Level

Posted by Peter Schuller <pe...@infidyne.com>.

> Thanks for the response Peter! I checked everything and it look good to me.
>
> I am stuck with this for almost 2 days now. Has anyone had this issue?

While it is certainly possible that you're running into a bug, it
seems unlikely to me since it is the kind of bug that would affect
almost anyone if it is failing with Unavailable due to unrelated (not
in replica sets) nodes being down.

Can you please post back with (1) the ring layout ('nodetool ring'),
and (2) the exact row key that you're testing with?

You might also want to run with DEBUG level (modify
log4j-server.properties at the top) and the strategy (assuming you are
using NetworkTopologyStrategy) will log selected endpoints, and
confirm that it's indeed picking endpoints that you think it should
based on getendpoints.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Consistency Level

Posted by Kamal Bahadur <ma...@gmail.com>.

Thanks for the response Peter! I checked everything and it look good to me.

I am stuck with this for almost 2 days now. Has anyone had this issue?

Thanks,
Kamal

On Wed, Dec 28, 2011 at 2:05 PM, Kamal Bahadur <ma...@gmail.com>wrote:

> Hi All,
>
> My Cassandra cluster has 4 nodes with a RF of 2. I am trying to verify if
> my data gets replicated to 2 nodes with the write consistency level of ONE.
> All the tests that I have done so far tells me that the data is not getting
> replicated for some reason.
>
> I executed the getendpoints command to find out the node where my record
> lives and tried to keep those two nodes running and brought down other
> nodes down. When I tried to read the record using hector, I am getting this
> exception "May not be enough replicas present to handle consistency level"
>
> I tried to read data using cassandra-cli but I am getting "null".
>
> I ran a manual repair command, but still getting the same exception. I
> noticed that as soon as the number of active nodes becomes less than 3 I
> get this exception.
>
> With consistency level ONE, I would assume that with just one node up and
> running (of course the one that has the data) I should get my data back.
> But this is not happening.
>
> Will the read repair happen automatically even if I read and write using
> the consistency level ONE?
>
> Any help will be much appreciated.
>
> Thanks,
> Kamal
>
> Environment details:
>
> Cluster: *4 nodes*
> RF: *2*
> Hector: *1.0-1*
> Cassandra: *0.8.6*
> Read CL: *ONE*
> Write CL: *ONE*
>
> Output of describe keyspace:
>
> Keyspace: MyKF:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:2]
>   Column Families:
>     ColumnFamily: MyCF
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator:
> org.apache.cassandra.db.marshal.BytesType
>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.5812499999999999/1440/124 (millions of
> ops/minutes/MB)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: true
>       Built indexes: [MyCF.MyCF_appId_idx, MyCF.MyCF_bcId_idx]
>       Column Metadata:
>         Column Name: .appId
>           Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>           Index Name: MyCF_appId_idx
>           Index Type: KEYS
>         Column Name: .bcId
>           Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>           Index Name: MyCF_bcId_idx
>           Index Type: KEYS
>
> My hector code:
>
> public class MyHectorTemplate extends HectorTemplateImpl { public void
> init() { CassandraHostConfigurator cassandraHostConfigurator = new
> CassandraHostConfigurator( servers); cassandraHostConfigurator .
> setLoadBalancingPolicy(new LeastActiveBalancingPolicy()); ThriftCluster
> cluster = new ThriftCluster( configuration.getString(
> "cassandra_cluster_name"), cassandraHostConfigurator); setCluster(cluster
> ); setKeyspaceName(configuration.getString("cassandra_keyspace_name"));
> ConfigurableConsistencyLevel configurableConsistencyLevelPolicy = new
> ConfigurableConsistencyLevel(); configurableConsistencyLevelPolicy .
> setDefaultReadConsistencyLevel(HConsistencyLevel.ONE);
> setConfigurableConsistencyLevelPolicy(configurableConsistencyLevelPolicy);
> super.init(); logger.info("------------------------------------------->" +
> getConfigurableConsistencyLevelPolicy().get( OperationType.READ).toString
> ()); } } Here is the log on one of the nodes: DEBUG [pool-2-thread-2] 2011
> -12-28 09:33:47,631 ClientState.java (line 87) logged in: #<User allow_all
> groups=[]> DEBUG [pool-2-thread-2] 2011-12-28 09:33:47,651 CassandraServer
> .java (line 670) scan DEBUG [pool-2-thread-2] 2011-12-28 09:33:47,665
> StorageProxy.java (line 889) restricted ranges for query [-1,-1] are [[-1,
> 0], (0,42535295865117307932921825928971026432], (
> 42535295865117307932921825928971026432,
> 85070591730234615865843651857942052864], (
> 85070591730234615865843651857942052864,
> 127605887595351923798765477786913079296], (
> 127605887595351923798765477786913079296,-1]] DEBUG [pool-2-thread-2] 2011-
> 12-28 09:33:47,666 StorageProxy.java (line 976) scan ranges are [-1,0],(0,
> 42535295865117307932921825928971026432],(
> 42535295865117307932921825928971026432,
> 85070591730234615865843651857942052864],(
> 85070591730234615865843651857942052864,
> 127605887595351923798765477786913079296],(
> 127605887595351923798765477786913079296,-1] DEBUG [pool-2-thread-2] 2011-
> 12-28 09:33:47,679 ReadCallback.java (line 76) Blockfor/repair is 1/false;
> setting up requests to DEBUG [pool-2-thread-2] 2011-12-28 09:33:47,679
> ReadCallback.java (line 203) Live nodes do not satisfy ConsistencyLevel (1
> required)
>
>
>