You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by sam_ <am...@yahoo.com> on 2011/04/15 08:43:00 UTC

Duplicate result of get_indexed_slices, depending on indexClause.count

Hi All,

I have been using Cassandra 0.7.2 and 0.7.4 with Thrift API (using Java).

I noticed that if I am querying a Column Family with indexed columns
sometimes I get a duplicate result in get_indexed_slices depending on the
number of rows in the CF and the count that I set in IndexClause.count.
It also depends on the order of rows in CF.

For example consider the following CF that I call Attributes:

create column family Attributes with comparator=UTF8Type
	and column_metadata=[
		{column_name: range_id, validation_class: LongType, index_type: KEYS},
		{column_name: attr_key, validation_class: UTF8Type, index_type: KEYS},
		{column_name: attr_val, validation_class: BytesType, index_type: KEYS}
	];

And suppose I have the following rows in the CF:

key           range_id       attr_key        attr_val
"1/@1/0",       1,              "A",               "1"
"1/5/0",          1,              "B",             "1000"
"3/@1/0",       2,              "A",               "1"
"3/5/0",          2,              "B",             "1001"
"5/@1/0",       3,              "A",               "2"
"5/5/0",          3,              "B",             "1002"
"7/@1/0",       4,              "A",               "2"
"7/5/0",          4,             "B",              "1003"

Now if I have a query with IndexClause like this (in pseudo code):

attr_key == "A" AND attr_val == "1"

with indexClause.count = 4;

Then I ill get the rows with the following keys from get_indexed_slices :

"1/@1/0", "3/@1/0", "3/@1/0"

The last key is a duplicate!

This is very sensitive to the order of rows in the CF and the number of rows
and the number you set in indexClause.count. I noticed when the number of
rows in the CF is twice the indexClause.count this issue might happen
depending on the order of rows in CF!

This seems a bug. And it occurs in both 0.7.2 and 0.7.4. 

Is there a solution to this problem? 

Many Thanks,
Sam





--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Duplicate-result-of-get-indexed-slices-depending-on-indexClause-count-tp6275394p6275394.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.

Re: Duplicate result of get_indexed_slices, depending on indexClause.count

Posted by Jonathan Ellis <jb...@gmail.com>.

https://issues.apache.org/jira/browse/CASSANDRA-2406

On Fri, Apr 15, 2011 at 1:43 AM, sam_ <am...@yahoo.com> wrote:
> Hi All,
>
> I have been using Cassandra 0.7.2 and 0.7.4 with Thrift API (using Java).
>
> I noticed that if I am querying a Column Family with indexed columns
> sometimes I get a duplicate result in get_indexed_slices depending on the
> number of rows in the CF and the count that I set in IndexClause.count.
> It also depends on the order of rows in CF.
>
> For example consider the following CF that I call Attributes:
>
> create column family Attributes with comparator=UTF8Type
>        and column_metadata=[
>                {column_name: range_id, validation_class: LongType, index_type: KEYS},
>                {column_name: attr_key, validation_class: UTF8Type, index_type: KEYS},
>                {column_name: attr_val, validation_class: BytesType, index_type: KEYS}
>        ];
>
> And suppose I have the following rows in the CF:
>
> key           range_id       attr_key        attr_val
> "1/@1/0",       1,              "A",               "1"
> "1/5/0",          1,              "B",             "1000"
> "3/@1/0",       2,              "A",               "1"
> "3/5/0",          2,              "B",             "1001"
> "5/@1/0",       3,              "A",               "2"
> "5/5/0",          3,              "B",             "1002"
> "7/@1/0",       4,              "A",               "2"
> "7/5/0",          4,             "B",              "1003"
>
> Now if I have a query with IndexClause like this (in pseudo code):
>
> attr_key == "A" AND attr_val == "1"
>
> with indexClause.count = 4;
>
> Then I ill get the rows with the following keys from get_indexed_slices :
>
> "1/@1/0", "3/@1/0", "3/@1/0"
>
> The last key is a duplicate!
>
> This is very sensitive to the order of rows in the CF and the number of rows
> and the number you set in indexClause.count. I noticed when the number of
> rows in the CF is twice the indexClause.count this issue might happen
> depending on the order of rows in CF!
>
> This seems a bug. And it occurs in both 0.7.2 and 0.7.4.
>
> Is there a solution to this problem?
>
> Many Thanks,
> Sam
>
>
>
>
>
> --
> View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Duplicate-result-of-get-indexed-slices-depending-on-indexClause-count-tp6275394p6275394.html
> Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com