You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Kamal Bahadur <ma...@gmail.com> on 2012/01/03 21:44:20 UTC

Re: Consistency Level

Hi Peter,

To test, I wiped out all the data from Cassandra and inserted just one
record. The row key is "7a7a32323636373030303438303031". I used
getendpoints to see where my data is and double checked the same using
sstable2json command.

Since the RF is 2, the data is currently on Node 1 and Node 4 of my 4 nodes
cluster. I used cassandra-cli to query the data by using one of the
secondary index but following are my results:

  Test Node 1 Node 2 Node 3 Node 4 Got data back?  1 Up Up Up Up Yes  2 Up
Up Up
Yes  3 Up Up
Up Yes  4
Up Up Up Yes  5 Up Up

No  6

Up Up No  7 Up

Up No

It turns out that even though my consistency level is ONE, since I am using
secondary index to query the data, at least 3 nodes has to be running. And
out of these 3 running nodes, it works even if one nodes contains the data.

Somewhere in the mailing I read that "Iterating through all of the rows
matching an index clause on your cluster is guaranteed to touch N/RF of the
nodes in your cluster, because each node only knows about data that is
indexed locally."

I am not sure what N/RF means in my case. Does it mean 4/2 = 2? where 4 is
the number of nodes and 2 is the RF. If it is 2, why is it not returning
any data when the two nodes that contains the data is running (test #7)?

For my use case, I have to have a RF of 2 and should be able to query using
secondary index with a CL of ONE. Is this possible when 2 nodes are down in
a 4 nodes cluster? Is there any limitations on using secondary index?

Thanks in advance.

Thanks,
Kamal

On Thu, Dec 29, 2011 at 6:40 PM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> > Thanks for the response Peter! I checked everything and it look good to
> me.
> >
> > I am stuck with this for almost 2 days now. Has anyone had this issue?
>
> While it is certainly possible that you're running into a bug, it
> seems unlikely to me since it is the kind of bug that would affect
> almost anyone if it is failing with Unavailable due to unrelated (not
> in replica sets) nodes being down.
>
> Can you please post back with (1) the ring layout ('nodetool ring'),
> and (2) the exact row key that you're testing with?
>
> You might also want to run with DEBUG level (modify
> log4j-server.properties at the top) and the strategy (assuming you are
> using NetworkTopologyStrategy) will log selected endpoints, and
> confirm that it's indeed picking endpoints that you think it should
> based on getendpoints.
>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>

Re: Consistency Level

Posted by Kamal Bahadur <ma...@gmail.com>.

This is what I was trying to understand. Thanks for your time Aaron! I
could not find any documentation on the web that explains the secondary
index like how you have done. Great job!

On Thu, Jan 5, 2012 at 3:18 AM, aaron morton <aa...@thelastpickle.com>wrote:

> I missed a ! in the code :) The query will break the token ring into
> ranges based on the node tokens and then find the UP nodes for each range.
>
> I've taken another walk through the code, the logs helped.
>
> In short, you do not have enough UP nodes to support an indexed get at CL
> ONE. It is working by design and you *should* have gotten an
> UnavailableException returned. There must be CL up replicas for each token
> range. In your test node 200.190 is down and so the next node, with RF 2
> this means there are no replicas for the range. The log line below is
> logged just before the UnavalableException is raised
>
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line
> 203) Live nodes  do not satisfy ConsistencyLevel (1 required)
>
>
> You will need at least every RF'th node UP. Another way to look at is if
> you have RF contiguous nodes DOWN you cannot perform an indexed get.
>
> If you are interested this is what the logs are saying…
>
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line
> 976) scan ranges are
> [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
>
> There are 4 token ranges to query, i.e. we have to make 4 reads to query
> over the whole cluster.
>
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line
> 76) Blockfor/repair is 1/false; setting up requests to /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line
> 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /
> 172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line
> 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /
> 172.16.202.118
>
> Starting to read for the first token range. A bug in 0.8.6 makes it read
> from 202.118 when it does not need to.
>
>
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
> 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
> 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
> 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line
> 123) collecting 0 of 100:
> 7a7a32323636373030303438303031:false:0@1325704860925009
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line
> 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
> [7a7a32323636373030303438303031:false:0@1325704860925009,])
>
> Scanned the secondary index on 200.130 and found an entry for the row
> key 7a7a32323636373030303438303031 matched the index expression.
>
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java
> (line 46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130
>
> Returning ZERO rows for the query result. Because the row key we read
> above has the token 111413491371349413596553235966977111575L which is not
> in the first token range from
> above (0,42535295865117307932921825928971026432] and this is the range we
> are interested in now.
>
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line
> 76) Blockfor/repair is 1/false; setting up requests to /172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line
> 1003) reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /
> 172.16.202.118
>
> Processing the second range now. There is only one node up for this range,
> 202.118
>
> DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913
> ResponseVerbHandler.java (line 48) Processing response on a callback from
> 172@/172.16.202.118
> DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913
> ResponseVerbHandler.java (line 48) Processing response on a callback from
> 173@/172.16.202.118
>
> Got the callback from 202.118 for both the query ranges.
>
> The logs on 202.118 show the same local local query. But I'm a little
> confused as to why the row exists on node 2 at all.
>
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line
> 76) Blockfor/repair is 1/false; setting up requests to
>
> Moving on, time to process the third token range
> (85070591730234615865843651857942052864,127605887595351923798765477786913079296]
>
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line
> 203) Live nodes  do not satisfy ConsistencyLevel (1 required)
>
> Oh noes there are no nodes available for that token range. Throw
> UnavailableException
>
> Hope that helps.
>
>
>   -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/01/2012, at 10:52 AM, Kamal Bahadur wrote:
>
> Hi Aaron,
>
> Thanks for your response!
>
> I re-ran the test case # 5. (Node 1 & 2 running, Node 3 & 4 down, Node 1
> contains the data, CL ONE and RF 2). I was connected to Node 1 while I ran
> the test. I still did not get any data. See below logs:
>
> -----------------------------------------------------------
> nodetool ring:
> -----------------------------------------------------------
>
> Address         DC          Rack        Status State   Load
> Owns    Token
>
> 127605887595351923798765477786913079296
> xxx.16.200.130  datacenter1 rack1       Up     Normal  584.97 KB
> 25.00%  0
> xxx.16.202.118  datacenter1 rack1       Up     Normal  570.91 KB
> 25.00%  42535295865117307932921825928971026432
> xxx.16.200.190  datacenter1 rack1       Down   Normal  593.51 KB
> 25.00%  85070591730234615865843651857942052864
> xxx.16.200.217  datacenter1 rack1       Down   Normal  617.23 KB
> 25.00%  127605887595351923798765477786913079296
>
> -----------------------------------------------------
> Logs on Node 1
> -----------------------------------------------------
> DEBUG [Thread-7] 2012-01-04 13:44:00,665 IncomingTcpConnection.java (line
> 148) Version is now 2
>  INFO [HintedHandoff:1] 2012-01-04 13:44:00,850 HintedHandOffManager.java
> (line 314) Endpoint /172.16.200.190 died before hint delivery, aborting
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java
> (line 303) Checking remote schema before delivering hints
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java
> (line 295) schema for /172.16.200.217 matches local schema
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java
> (line 309) Sleeping 17296ms to stagger hint delivery
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,857 CassandraServer.java
> (line 670) scan
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line
> 889) restricted ranges for query [-1,-1] are [[-1,0],
> (0,42535295865117307932921825928971026432],
> (42535295865117307932921825928971026432,85070591730234615865843651857942052864],
> (85070591730234615865843651857942052864,127605887595351923798765477786913079296],
> (127605887595351923798765477786913079296,-1]]
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line
> 976) scan ranges are
> [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
> DEBUG [Thread-7] 2012-01-04 13:44:00,876 IncomingTcpConnection.java (line
> 148) Version is now 2
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line
> 76) Blockfor/repair is 1/false; setting up requests to /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line
> 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /
> 172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line
> 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /
> 172.16.202.118
> DEBUG [WRITE-/172.16.202.118] 2012-01-04 13:44:00,884
> OutboundTcpConnection.java (line 162) attempting to connect to /
> 172.16.202.118
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
> 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
> 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
> 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line
> 123) collecting 0 of 100:
> 7a7a32323636373030303438303031:false:0@1325704860925009
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line
> 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
> [7a7a32323636373030303438303031:false:0@1325704860925009,])
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java
> (line 46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130
> DEBUG [RequestResponseStage:1] 2012-01-04 13:44:00,895
> ResponseVerbHandler.java (line 48) Processing response on a callback from
> 171@/172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line
> 76) Blockfor/repair is 1/false; setting up requests to /172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line
> 1003) reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /
> 172.16.202.118
> DEBUG [Thread-7] 2012-01-04 13:44:00,912 IncomingTcpConnection.java (line
> 148) Version is now 2
> DEBUG [Thread-7] 2012-01-04 13:44:00,913 IncomingTcpConnection.java (line
> 148) Version is now 2
> DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913
> ResponseVerbHandler.java (line 48) Processing response on a callback from
> 172@/172.16.202.118
> DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913
> ResponseVerbHandler.java (line 48) Processing response on a callback from
> 173@/172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line
> 76) Blockfor/repair is 1/false; setting up requests to
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line
> 203) Live nodes  do not satisfy ConsistencyLevel (1 required)
> -----------------------------------------------------
> Logs on Node 2
> -----------------------------------------------------
> DEBUG [Thread-6] 2012-01-04 13:44:00,886 Gossiper.java (line 197) Setting
> version 2 for /172.16.200.130
> DEBUG [Thread-6] 2012-01-04 13:44:00,886 IncomingTcpConnection.java (line
> 105) set version for /172.16.200.130 to 2
> DEBUG [Thread-6] 2012-01-04 13:44:00,895 IncomingTcpConnection.java (line
> 148) Version is now 2
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
> 1550) Primary scan clause is member
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
> 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
> 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
> 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
> 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,907 ColumnFamilyStore.java (line
> 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,909 SliceQueryFilter.java (line
> 123) collecting 0 of 100:
> 7a7a32323636373030303438303031:false:0@1325615757889011
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,910 ColumnFamilyStore.java (line
> 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
> [7a7a32323636373030303438303031:false:0@1325615757889011,])
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,910 SliceQueryFilter.java (line
> 123) collecting 0 of 100:
> 7a7a32323636373030303438303031:false:0@1325615757889011
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 ColumnFamilyStore.java (line
> 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
> [7a7a32323636373030303438303031:false:0@1325615757889011,])
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java
> (line 46) Sending RangeSliceReply{rows=} to 172@/172.16.200.130
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java
> (line 46) Sending RangeSliceReply{rows=} to 173@/172.16.200.130
> DEBUG [Thread-4] 2012-01-04 13:44:01,664 IncomingTcpConnection.java (line
> 148) Version is now 2
>
>
>
> On Wed, Jan 4, 2012 at 1:15 AM, aaron morton <aa...@thelastpickle.com>wrote:
>
>> I've not spent much time with the secondary indexes, so a couple of
>> questions.
>>
>> Whats is the output of nodetool ring ?
>> Which node were you connected to when you did the get ?
>> If you enable DEBUG logging what do the log messages from StorageProxy
>> say that contain the string "scan ranges are" and "reading .. from ..."
>>
>> Now for the wild guessing….It's working as designed for a CL ONE
>> request. Looking at test case 5 and *assuming* you were connected to node 2
>> this is what I think is happening: A get indexed slice query without a
>> start key does not know which nodes will contain the data. Reading the code
>> it will consider the replicas for the minimum token as nodes to send the
>> query to, for a CL ONE query it will only use one. If you were connected to
>> node 2 the query would have executed only on node 2.
>>
>> This is where I get confused. What happens if you have 50 nodes, with RF
>> 3, and you execute a get_indexed_slice at QUOURM with no start_key and the
>> only rows that satisfy the query exist on nodes 47, 48 and 49. So they are
>> a long way away from the minimum token, assuming SimpleStrategy and well
>> ordered token ring.
>>
>> I think I've missed something, anyone ?
>> Cheers
>>
>>   -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 4/01/2012, at 9:44 AM, Kamal Bahadur wrote:
>>
>> Hi Peter,
>>
>> To test, I wiped out all the data from Cassandra and inserted just one
>> record. The row key is "7a7a32323636373030303438303031". I used
>> getendpoints to see where my data is and double checked the same using
>> sstable2json command.
>>
>> Since the RF is 2, the data is currently on Node 1 and Node 4 of my 4
>> nodes cluster. I used cassandra-cli to query the data by using one of the
>> secondary index but following are my results:
>>
>>   Test Node 1 Node 2 Node 3 Node 4 Got data back?  1 Up Up Up Up Yes  2
>> Up Up Up
>> Yes  3 Up Up
>> Up Yes  4
>> Up Up Up Yes  5 Up Up
>>
>> No  6
>>
>> Up Up No  7 Up
>>
>> Up No
>>
>> It turns out that even though my consistency level is ONE, since I am
>> using secondary index to query the data, at least 3 nodes has to be
>> running. And out of these 3 running nodes, it works even if one nodes
>> contains the data.
>>
>> Somewhere in the mailing I read that "Iterating through all of the rows
>> matching an index clause on your cluster is guaranteed to touch N/RF of the
>> nodes in your cluster, because each node only knows about data that is
>> indexed locally."
>>
>> I am not sure what N/RF means in my case. Does it mean 4/2 = 2? where 4
>> is the number of nodes and 2 is the RF. If it is 2, why is it not returning
>> any data when the two nodes that contains the data is running (test #7)?
>>
>> For my use case, I have to have a RF of 2 and should be able to query
>> using secondary index with a CL of ONE. Is this possible when 2 nodes are
>> down in a 4 nodes cluster? Is there any limitations on using secondary
>> index?
>>
>> Thanks in advance.
>>
>> Thanks,
>> Kamal
>>
>> On Thu, Dec 29, 2011 at 6:40 PM, Peter Schuller <
>> peter.schuller@infidyne.com> wrote:
>>
>>> > Thanks for the response Peter! I checked everything and it look good
>>> to me.
>>> >
>>> > I am stuck with this for almost 2 days now. Has anyone had this issue?
>>>
>>> While it is certainly possible that you're running into a bug, it
>>> seems unlikely to me since it is the kind of bug that would affect
>>> almost anyone if it is failing with Unavailable due to unrelated (not
>>> in replica sets) nodes being down.
>>>
>>> Can you please post back with (1) the ring layout ('nodetool ring'),
>>> and (2) the exact row key that you're testing with?
>>>
>>> You might also want to run with DEBUG level (modify
>>> log4j-server.properties at the top) and the strategy (assuming you are
>>> using NetworkTopologyStrategy) will log selected endpoints, and
>>> confirm that it's indeed picking endpoints that you think it should
>>> based on getendpoints.
>>>
>>> --
>>> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>>>
>>
>>
>>
>
>

Re: Consistency Level

Posted by aaron morton <aa...@thelastpickle.com>.

I missed a ! in the code :) The query will break the token ring into ranges based on the node tokens and then find the UP nodes for each range. 

I've taken another walk through the code, the logs helped. 

In short, you do not have enough UP nodes to support an indexed get at CL ONE. It is working by design and you *should* have gotten an UnavailableException returned. There must be CL up replicas for each token range. In your test node 200.190 is down and so the next node, with RF 2 this means there are no replicas for the range. The log line below is logged just before the UnavalableException is raised

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) Live nodes  do not satisfy ConsistencyLevel (1 required)


You will need at least every RF'th node UP. Another way to look at is if you have RF contiguous nodes DOWN you cannot perform an indexed get. 

If you are interested this is what the logs are saying…

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 976) scan ranges are [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
There are 4 token ranges to query, i.e. we have to make 4 reads to query over the whole cluster. 

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line 76) Blockfor/repair is 1/false; setting up requests to /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.202.118
Starting to read for the first token range. A bug in 0.8.6 makes it read from 202.118 when it does not need to. 


> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line 123) collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325704860925009
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx [7a7a32323636373030303438303031:false:0@1325704860925009,])
Scanned the secondary index on 200.130 and found an entry for the row key 7a7a32323636373030303438303031 matched the index expression. 

> DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130
Returning ZERO rows for the query result. Because the row key we read above has the token 111413491371349413596553235966977111575L which is not in the first token range from above (0,42535295865117307932921825928971026432] and this is the range we are interested in now. 

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line 76) Blockfor/repair is 1/false; setting up requests to /172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line 1003) reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /172.16.202.118
Processing the second range now. There is only one node up for this range, 202.118

> DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913 ResponseVerbHandler.java (line 48) Processing response on a callback from 172@/172.16.202.118
> DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913 ResponseVerbHandler.java (line 48) Processing response on a callback from 173@/172.16.202.118
Got the callback from 202.118 for both the query ranges. 

The logs on 202.118 show the same local local query. But I'm a little confused as to why the row exists on node 2 at all.

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 76) Blockfor/repair is 1/false; setting up requests to
Moving on, time to process the third token range (85070591730234615865843651857942052864,127605887595351923798765477786913079296]

> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) Live nodes  do not satisfy ConsistencyLevel (1 required)

Oh noes there are no nodes available for that token range. Throw UnavailableException

Hope that helps. 


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/01/2012, at 10:52 AM, Kamal Bahadur wrote:

> Hi Aaron, 
> 
> Thanks for your response!
> 
> I re-ran the test case # 5. (Node 1 & 2 running, Node 3 & 4 down, Node 1 contains the data, CL ONE and RF 2). I was connected to Node 1 while I ran the test. I still did not get any data. See below logs:
> 
> -----------------------------------------------------------
> nodetool ring:
> -----------------------------------------------------------
> 
> Address         DC          Rack        Status State   Load            Owns    Token                                 
>                                                                                127605887595351923798765477786913079296
> xxx.16.200.130  datacenter1 rack1       Up     Normal  584.97 KB       25.00%  0                                     
> xxx.16.202.118  datacenter1 rack1       Up     Normal  570.91 KB       25.00%  42535295865117307932921825928971026432
> xxx.16.200.190  datacenter1 rack1       Down   Normal  593.51 KB       25.00%  85070591730234615865843651857942052864
> xxx.16.200.217  datacenter1 rack1       Down   Normal  617.23 KB       25.00%  127605887595351923798765477786913079296
> 
> -----------------------------------------------------
> Logs on Node 1
> -----------------------------------------------------
> DEBUG [Thread-7] 2012-01-04 13:44:00,665 IncomingTcpConnection.java (line 148) Version is now 2
>  INFO [HintedHandoff:1] 2012-01-04 13:44:00,850 HintedHandOffManager.java (line 314) Endpoint /172.16.200.190 died before hint delivery, aborting
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java (line 303) Checking remote schema before delivering hints
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java (line 295) schema for /172.16.200.217 matches local schema
> DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java (line 309) Sleeping 17296ms to stagger hint delivery
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,857 CassandraServer.java (line 670) scan
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 889) restricted ranges for query [-1,-1] are [[-1,0], (0,42535295865117307932921825928971026432], (42535295865117307932921825928971026432,85070591730234615865843651857942052864], (85070591730234615865843651857942052864,127605887595351923798765477786913079296], (127605887595351923798765477786913079296,-1]]
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line 976) scan ranges are [-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
> DEBUG [Thread-7] 2012-01-04 13:44:00,876 IncomingTcpConnection.java (line 148) Version is now 2
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line 76) Blockfor/repair is 1/false; setting up requests to /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line 1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /172.16.202.118
> DEBUG [WRITE-/172.16.202.118] 2012-01-04 13:44:00,884 OutboundTcpConnection.java (line 162) attempting to connect to /172.16.202.118
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line 123) collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325704860925009
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx [7a7a32323636373030303438303031:false:0@1325704860925009,])
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130
> DEBUG [RequestResponseStage:1] 2012-01-04 13:44:00,895 ResponseVerbHandler.java (line 48) Processing response on a callback from 171@/172.16.200.130
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line 76) Blockfor/repair is 1/false; setting up requests to /172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line 1003) reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /172.16.202.118
> DEBUG [Thread-7] 2012-01-04 13:44:00,912 IncomingTcpConnection.java (line 148) Version is now 2
> DEBUG [Thread-7] 2012-01-04 13:44:00,913 IncomingTcpConnection.java (line 148) Version is now 2
> DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913 ResponseVerbHandler.java (line 48) Processing response on a callback from 172@/172.16.202.118
> DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913 ResponseVerbHandler.java (line 48) Processing response on a callback from 173@/172.16.202.118
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 76) Blockfor/repair is 1/false; setting up requests to
> DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 203) Live nodes  do not satisfy ConsistencyLevel (1 required)
> -----------------------------------------------------
> Logs on Node 2
> -----------------------------------------------------
> DEBUG [Thread-6] 2012-01-04 13:44:00,886 Gossiper.java (line 197) Setting version 2 for /172.16.200.130
> DEBUG [Thread-6] 2012-01-04 13:44:00,886 IncomingTcpConnection.java (line 105) set version for /172.16.200.130 to 2
> DEBUG [Thread-6] 2012-01-04 13:44:00,895 IncomingTcpConnection.java (line 148) Version is now 2
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1550) Primary scan clause is member
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1550) Primary scan clause is member
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1563) Expanding slice filter to entire row to cover additional expressions
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,907 ColumnFamilyStore.java (line 1605) Scanning index 'Audit_Log.member EQ kamal' starting with
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,909 SliceQueryFilter.java (line 123) collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325615757889011
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,910 ColumnFamilyStore.java (line 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx [7a7a32323636373030303438303031:false:0@1325615757889011,])
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,910 SliceQueryFilter.java (line 123) collecting 0 of 100: 7a7a32323636373030303438303031:false:0@1325615757889011
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 ColumnFamilyStore.java (line 1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx [7a7a32323636373030303438303031:false:0@1325615757889011,])
> DEBUG [ReadStage:2] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 172@/172.16.200.130
> DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java (line 46) Sending RangeSliceReply{rows=} to 173@/172.16.200.130
> DEBUG [Thread-4] 2012-01-04 13:44:01,664 IncomingTcpConnection.java (line 148) Version is now 2
> 
> 
> 
> On Wed, Jan 4, 2012 at 1:15 AM, aaron morton <aa...@thelastpickle.com> wrote:
> I've not spent much time with the secondary indexes, so a couple of questions. 
> 
> Whats is the output of nodetool ring ? 
> Which node were you connected to when you did the get ?
> If you enable DEBUG logging what do the log messages from StorageProxy say that contain the string "scan ranges are" and "reading .. from ..."
>  
> Now for the wild guessing….It's working as designed for a CL ONE request. Looking at test case 5 and *assuming* you were connected to node 2 this is what I think is happening: A get indexed slice query without a start key does not know which nodes will contain the data. Reading the code it will consider the replicas for the minimum token as nodes to send the query to, for a CL ONE query it will only use one. If you were connected to node 2 the query would have executed only on node 2. 
> 
> This is where I get confused. What happens if you have 50 nodes, with RF 3, and you execute a get_indexed_slice at QUOURM with no start_key and the only rows that satisfy the query exist on nodes 47, 48 and 49. So they are a long way away from the minimum token, assuming SimpleStrategy and well ordered token ring. 
> 
> I think I've missed something, anyone ?
> Cheers
>  
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 4/01/2012, at 9:44 AM, Kamal Bahadur wrote:
> 
>> Hi Peter,
>> 
>> To test, I wiped out all the data from Cassandra and inserted just one record. The row key is "7a7a32323636373030303438303031". I used getendpoints to see where my data is and double checked the same using sstable2json command.
>> 
>> Since the RF is 2, the data is currently on Node 1 and Node 4 of my 4 nodes cluster. I used cassandra-cli to query the data by using one of the secondary index but following are my results:
>> 
>> Test	 Node 1	 Node 2	 Node 3	 Node 4	 Got data back?
>> 1	Up	Up	Up	Up	Yes
>> 2	Up	Up	Up	
>> Yes
>> 3	Up	Up	
>> Up	Yes
>> 4	
>> Up	Up	Up	Yes
>> 5	Up	Up	
>> 
>> No
>> 6	
>> 
>> Up	Up	No
>> 7	Up	
>> 
>> Up	No
>> 
>> 
>> It turns out that even though my consistency level is ONE, since I am using secondary index to query the data, at least 3 nodes has to be running. And out of these 3 running nodes, it works even if one nodes contains the data.
>> 
>> Somewhere in the mailing I read that "Iterating through all of the rows matching an index clause on your cluster is guaranteed to touch N/RF of the nodes in your cluster, because each node only knows about data that is indexed locally."
>> 
>> I am not sure what N/RF means in my case. Does it mean 4/2 = 2? where 4 is the number of nodes and 2 is the RF. If it is 2, why is it not returning any data when the two nodes that contains the data is running (test #7)?
>> 
>> For my use case, I have to have a RF of 2 and should be able to query using secondary index with a CL of ONE. Is this possible when 2 nodes are down in a 4 nodes cluster? Is there any limitations on using secondary index?
>> 
>> Thanks in advance.
>> 
>> Thanks,
>> Kamal
>> 
>> On Thu, Dec 29, 2011 at 6:40 PM, Peter Schuller <pe...@infidyne.com> wrote:
>> > Thanks for the response Peter! I checked everything and it look good to me.
>> >
>> > I am stuck with this for almost 2 days now. Has anyone had this issue?
>> 
>> While it is certainly possible that you're running into a bug, it
>> seems unlikely to me since it is the kind of bug that would affect
>> almost anyone if it is failing with Unavailable due to unrelated (not
>> in replica sets) nodes being down.
>> 
>> Can you please post back with (1) the ring layout ('nodetool ring'),
>> and (2) the exact row key that you're testing with?
>> 
>> You might also want to run with DEBUG level (modify
>> log4j-server.properties at the top) and the strategy (assuming you are
>> using NetworkTopologyStrategy) will log selected endpoints, and
>> confirm that it's indeed picking endpoints that you think it should
>> based on getendpoints.
>> 
>> --
>> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>> 
> 
>

Re: Consistency Level

Posted by Kamal Bahadur <ma...@gmail.com>.

Hi Aaron,

Thanks for your response!

I re-ran the test case # 5. (Node 1 & 2 running, Node 3 & 4 down, Node 1
contains the data, CL ONE and RF 2). I was connected to Node 1 while I ran
the test. I still did not get any data. See below logs:

-----------------------------------------------------------
nodetool ring:
-----------------------------------------------------------

Address         DC          Rack        Status State   Load
Owns    Token

127605887595351923798765477786913079296
xxx.16.200.130  datacenter1 rack1       Up     Normal  584.97 KB
25.00%  0
xxx.16.202.118  datacenter1 rack1       Up     Normal  570.91 KB
25.00%  42535295865117307932921825928971026432
xxx.16.200.190  datacenter1 rack1       Down   Normal  593.51 KB
25.00%  85070591730234615865843651857942052864
xxx.16.200.217  datacenter1 rack1       Down   Normal  617.23 KB
25.00%  127605887595351923798765477786913079296

-----------------------------------------------------
Logs on Node 1
-----------------------------------------------------
DEBUG [Thread-7] 2012-01-04 13:44:00,665 IncomingTcpConnection.java (line
148) Version is now 2
 INFO [HintedHandoff:1] 2012-01-04 13:44:00,850 HintedHandOffManager.java
(line 314) Endpoint /172.16.200.190 died before hint delivery, aborting
DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java
(line 303) Checking remote schema before delivering hints
DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java
(line 295) schema for /172.16.200.217 matches local schema
DEBUG [HintedHandoff:1] 2012-01-04 13:44:00,851 HintedHandOffManager.java
(line 309) Sleeping 17296ms to stagger hint delivery
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,857 CassandraServer.java (line
670) scan
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line
889) restricted ranges for query [-1,-1] are [[-1,0],
(0,42535295865117307932921825928971026432],
(42535295865117307932921825928971026432,85070591730234615865843651857942052864],
(85070591730234615865843651857942052864,127605887595351923798765477786913079296],
(127605887595351923798765477786913079296,-1]]
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,869 StorageProxy.java (line
976) scan ranges are
[-1,0],(0,42535295865117307932921825928971026432],(42535295865117307932921825928971026432,85070591730234615865843651857942052864],(85070591730234615865843651857942052864,127605887595351923798765477786913079296],(127605887595351923798765477786913079296,-1]
DEBUG [Thread-7] 2012-01-04 13:44:00,876 IncomingTcpConnection.java (line
148) Version is now 2
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,881 ReadCallback.java (line 76)
Blockfor/repair is 1/false; setting up requests to /172.16.200.130
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line
1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /
172.16.200.130
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,884 StorageProxy.java (line
1003) reading org.apache.cassandra.db.IndexScanCommand@c9f997 from /
172.16.202.118
DEBUG [WRITE-/172.16.202.118] 2012-01-04 13:44:00,884
OutboundTcpConnection.java (line 162) attempting to connect to /
172.16.202.118
DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
1550) Primary scan clause is member
DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
1563) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:2] 2012-01-04 13:44:00,887 ColumnFamilyStore.java (line
1605) Scanning index 'Audit_Log.member EQ kamal' starting with
DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 SliceQueryFilter.java (line
123) collecting 0 of 100:
7a7a32323636373030303438303031:false:0@1325704860925009
DEBUG [ReadStage:2] 2012-01-04 13:44:00,893 ColumnFamilyStore.java (line
1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
[7a7a32323636373030303438303031:false:0@1325704860925009,])
DEBUG [ReadStage:2] 2012-01-04 13:44:00,894 IndexScanVerbHandler.java (line
46) Sending RangeSliceReply{rows=} to 171@/172.16.200.130
DEBUG [RequestResponseStage:1] 2012-01-04 13:44:00,895
ResponseVerbHandler.java (line 48) Processing response on a callback from
171@/172.16.200.130
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,895 ReadCallback.java (line 76)
Blockfor/repair is 1/false; setting up requests to /172.16.202.118
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,896 StorageProxy.java (line
1003) reading org.apache.cassandra.db.IndexScanCommand@10eeb26 from /
172.16.202.118
DEBUG [Thread-7] 2012-01-04 13:44:00,912 IncomingTcpConnection.java (line
148) Version is now 2
DEBUG [Thread-7] 2012-01-04 13:44:00,913 IncomingTcpConnection.java (line
148) Version is now 2
DEBUG [RequestResponseStage:3] 2012-01-04 13:44:00,913
ResponseVerbHandler.java (line 48) Processing response on a callback from
172@/172.16.202.118
DEBUG [RequestResponseStage:2] 2012-01-04 13:44:00,913
ResponseVerbHandler.java (line 48) Processing response on a callback from
173@/172.16.202.118
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line 76)
Blockfor/repair is 1/false; setting up requests to
DEBUG [pool-2-thread-1] 2012-01-04 13:44:00,913 ReadCallback.java (line
203) Live nodes  do not satisfy ConsistencyLevel (1 required)
-----------------------------------------------------
Logs on Node 2
-----------------------------------------------------
DEBUG [Thread-6] 2012-01-04 13:44:00,886 Gossiper.java (line 197) Setting
version 2 for /172.16.200.130
DEBUG [Thread-6] 2012-01-04 13:44:00,886 IncomingTcpConnection.java (line
105) set version for /172.16.200.130 to 2
DEBUG [Thread-6] 2012-01-04 13:44:00,895 IncomingTcpConnection.java (line
148) Version is now 2
DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
1550) Primary scan clause is member
DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
1550) Primary scan clause is member
DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
1563) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:3] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
1563) Expanding slice filter to entire row to cover additional expressions
DEBUG [ReadStage:2] 2012-01-04 13:44:00,906 ColumnFamilyStore.java (line
1605) Scanning index 'Audit_Log.member EQ kamal' starting with
DEBUG [ReadStage:3] 2012-01-04 13:44:00,907 ColumnFamilyStore.java (line
1605) Scanning index 'Audit_Log.member EQ kamal' starting with
DEBUG [ReadStage:2] 2012-01-04 13:44:00,909 SliceQueryFilter.java (line
123) collecting 0 of 100:
7a7a32323636373030303438303031:false:0@1325615757889011
DEBUG [ReadStage:2] 2012-01-04 13:44:00,910 ColumnFamilyStore.java (line
1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
[7a7a32323636373030303438303031:false:0@1325615757889011,])
DEBUG [ReadStage:3] 2012-01-04 13:44:00,910 SliceQueryFilter.java (line
123) collecting 0 of 100:
7a7a32323636373030303438303031:false:0@1325615757889011
DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 ColumnFamilyStore.java (line
1617) fetched ColumnFamily(Audit_Log.Audit_Log_member_idx
[7a7a32323636373030303438303031:false:0@1325615757889011,])
DEBUG [ReadStage:2] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java (line
46) Sending RangeSliceReply{rows=} to 172@/172.16.200.130
DEBUG [ReadStage:3] 2012-01-04 13:44:00,911 IndexScanVerbHandler.java (line
46) Sending RangeSliceReply{rows=} to 173@/172.16.200.130
DEBUG [Thread-4] 2012-01-04 13:44:01,664 IncomingTcpConnection.java (line
148) Version is now 2



On Wed, Jan 4, 2012 at 1:15 AM, aaron morton <aa...@thelastpickle.com>wrote:

> I've not spent much time with the secondary indexes, so a couple of
> questions.
>
> Whats is the output of nodetool ring ?
> Which node were you connected to when you did the get ?
> If you enable DEBUG logging what do the log messages from StorageProxy say
> that contain the string "scan ranges are" and "reading .. from ..."
>
> Now for the wild guessing….It's working as designed for a CL ONE
> request. Looking at test case 5 and *assuming* you were connected to node 2
> this is what I think is happening: A get indexed slice query without a
> start key does not know which nodes will contain the data. Reading the code
> it will consider the replicas for the minimum token as nodes to send the
> query to, for a CL ONE query it will only use one. If you were connected to
> node 2 the query would have executed only on node 2.
>
> This is where I get confused. What happens if you have 50 nodes, with RF
> 3, and you execute a get_indexed_slice at QUOURM with no start_key and the
> only rows that satisfy the query exist on nodes 47, 48 and 49. So they are
> a long way away from the minimum token, assuming SimpleStrategy and well
> ordered token ring.
>
> I think I've missed something, anyone ?
> Cheers
>
>   -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 4/01/2012, at 9:44 AM, Kamal Bahadur wrote:
>
> Hi Peter,
>
> To test, I wiped out all the data from Cassandra and inserted just one
> record. The row key is "7a7a32323636373030303438303031". I used
> getendpoints to see where my data is and double checked the same using
> sstable2json command.
>
> Since the RF is 2, the data is currently on Node 1 and Node 4 of my 4
> nodes cluster. I used cassandra-cli to query the data by using one of the
> secondary index but following are my results:
>
>   Test Node 1 Node 2 Node 3 Node 4 Got data back?  1 Up Up Up Up Yes  2 Up
> Up Up
> Yes  3 Up Up
> Up Yes  4
> Up Up Up Yes  5 Up Up
>
> No  6
>
> Up Up No  7 Up
>
> Up No
>
> It turns out that even though my consistency level is ONE, since I am
> using secondary index to query the data, at least 3 nodes has to be
> running. And out of these 3 running nodes, it works even if one nodes
> contains the data.
>
> Somewhere in the mailing I read that "Iterating through all of the rows
> matching an index clause on your cluster is guaranteed to touch N/RF of the
> nodes in your cluster, because each node only knows about data that is
> indexed locally."
>
> I am not sure what N/RF means in my case. Does it mean 4/2 = 2? where 4 is
> the number of nodes and 2 is the RF. If it is 2, why is it not returning
> any data when the two nodes that contains the data is running (test #7)?
>
> For my use case, I have to have a RF of 2 and should be able to query
> using secondary index with a CL of ONE. Is this possible when 2 nodes are
> down in a 4 nodes cluster? Is there any limitations on using secondary
> index?
>
> Thanks in advance.
>
> Thanks,
> Kamal
>
> On Thu, Dec 29, 2011 at 6:40 PM, Peter Schuller <
> peter.schuller@infidyne.com> wrote:
>
>> > Thanks for the response Peter! I checked everything and it look good to
>> me.
>> >
>> > I am stuck with this for almost 2 days now. Has anyone had this issue?
>>
>> While it is certainly possible that you're running into a bug, it
>> seems unlikely to me since it is the kind of bug that would affect
>> almost anyone if it is failing with Unavailable due to unrelated (not
>> in replica sets) nodes being down.
>>
>> Can you please post back with (1) the ring layout ('nodetool ring'),
>> and (2) the exact row key that you're testing with?
>>
>> You might also want to run with DEBUG level (modify
>> log4j-server.properties at the top) and the strategy (assuming you are
>> using NetworkTopologyStrategy) will log selected endpoints, and
>> confirm that it's indeed picking endpoints that you think it should
>> based on getendpoints.
>>
>> --
>> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>>
>
>
>

Re: Consistency Level

Posted by aaron morton <aa...@thelastpickle.com>.

I've not spent much time with the secondary indexes, so a couple of questions. 

Whats is the output of nodetool ring ? 
Which node were you connected to when you did the get ?
If you enable DEBUG logging what do the log messages from StorageProxy say that contain the string "scan ranges are" and "reading .. from ..."

Now for the wild guessing….It's working as designed for a CL ONE request. Looking at test case 5 and *assuming* you were connected to node 2 this is what I think is happening: A get indexed slice query without a start key does not know which nodes will contain the data. Reading the code it will consider the replicas for the minimum token as nodes to send the query to, for a CL ONE query it will only use one. If you were connected to node 2 the query would have executed only on node 2. 

This is where I get confused. What happens if you have 50 nodes, with RF 3, and you execute a get_indexed_slice at QUOURM with no start_key and the only rows that satisfy the query exist on nodes 47, 48 and 49. So they are a long way away from the minimum token, assuming SimpleStrategy and well ordered token ring. 

I think I've missed something, anyone ?
Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/01/2012, at 9:44 AM, Kamal Bahadur wrote:

> Hi Peter,
> 
> To test, I wiped out all the data from Cassandra and inserted just one record. The row key is "7a7a32323636373030303438303031". I used getendpoints to see where my data is and double checked the same using sstable2json command.
> 
> Since the RF is 2, the data is currently on Node 1 and Node 4 of my 4 nodes cluster. I used cassandra-cli to query the data by using one of the secondary index but following are my results:
> 
> Test	 Node 1	 Node 2	 Node 3	 Node 4	 Got data back?
> 1	Up	Up	Up	Up	Yes
> 2	Up	Up	Up	
> Yes
> 3	Up	Up	
> Up	Yes
> 4	
> Up	Up	Up	Yes
> 5	Up	Up	
> 
> No
> 6	
> 
> Up	Up	No
> 7	Up	
> 
> Up	No
> 
> 
> It turns out that even though my consistency level is ONE, since I am using secondary index to query the data, at least 3 nodes has to be running. And out of these 3 running nodes, it works even if one nodes contains the data.
> 
> Somewhere in the mailing I read that "Iterating through all of the rows matching an index clause on your cluster is guaranteed to touch N/RF of the nodes in your cluster, because each node only knows about data that is indexed locally."
> 
> I am not sure what N/RF means in my case. Does it mean 4/2 = 2? where 4 is the number of nodes and 2 is the RF. If it is 2, why is it not returning any data when the two nodes that contains the data is running (test #7)?
> 
> For my use case, I have to have a RF of 2 and should be able to query using secondary index with a CL of ONE. Is this possible when 2 nodes are down in a 4 nodes cluster? Is there any limitations on using secondary index?
> 
> Thanks in advance.
> 
> Thanks,
> Kamal
> 
> On Thu, Dec 29, 2011 at 6:40 PM, Peter Schuller <pe...@infidyne.com> wrote:
> > Thanks for the response Peter! I checked everything and it look good to me.
> >
> > I am stuck with this for almost 2 days now. Has anyone had this issue?
> 
> While it is certainly possible that you're running into a bug, it
> seems unlikely to me since it is the kind of bug that would affect
> almost anyone if it is failing with Unavailable due to unrelated (not
> in replica sets) nodes being down.
> 
> Can you please post back with (1) the ring layout ('nodetool ring'),
> and (2) the exact row key that you're testing with?
> 
> You might also want to run with DEBUG level (modify
> log4j-server.properties at the top) and the strategy (assuming you are
> using NetworkTopologyStrategy) will log selected endpoints, and
> confirm that it's indeed picking endpoints that you think it should
> based on getendpoints.
> 
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>