You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Siddharth Verma <ve...@snapdeal.com> on 2016/06/13 12:39:45 UTC

select query on entire primary key returning more than one row in result

Hi,
We are facing this issue on production,
We upgraded our cassandra from 3.0.3 to 3.5

When we ran a query with partition key and clustering column(entire primary
key specified), we get 16 rows in return.

We have 2DC's, each with RF 3 for our keyspace.

1. We connected with cqlsh, and setting consistency to local_one, and
tracing on, we saw that, we got correct result on 3, and erroneous results
on 3.
Correct result : only 1 row
Erroneous result : 16 rows

2. we executed the statement while specifying only the clustering column
with ALLOW FILTERING, then we got the only one record for that partition
key.

3. While upgrading, we dropped key_cache folder on some, not all.

What could be the causes and how to fix this issue?

We speculate that it might be due to cache.

Any help would be appreciated.

Thanks
Siddharth Verma

Re: select query on entire primary key returning more than one row in result

Posted by Bhuvan Rawal <bh...@gmail.com>.

Joel,

Id rather thank you for naming 11513 earlier in the mail, I would have been
lost in the code for a much longer time otherwise.

Repeating what Tianshi mentioned in 11513 - "*Cassandra community is
awesome! Should buy you a beer, Joel."* :)

On Wed, Jun 15, 2016 at 6:01 AM, Joel Knighton <jo...@datastax.com>
wrote:

> Great work, Bhuvan - I sat down after work to look at this more carefully.
>
> For a short summary, you are correct.
>
> For a longer summary, I initially thought the reproduction you provided
> would not run into the issue from 3.4/3.5 because it didn't select any
> static columns, which meant that it wouldn't have statics in its
> ColumnFilter (basically, the filter we apply when deciding if we need to
> look for the requested data in more SSTables). This was an incorrect
> understanding - in order to preserve the CQL semantic (see CASSANDRA-6588
> for details), we are including all columns, including the static columns,
> in the fetched columns, which means they are part of the ColumnFilter. I
> believe there may be an opportunity for an optimization here, but that's a
> whole different discussion. I now agree that these are the same issue.
>
> You are correct in your analysis that 3.4/3.5 are the only affected
> versions. It has been patched in release 3.6 forward and was not introduced
> until 3.4
>
> Thanks for sticking with me on this - I'm going to resolve CASSANDRA-12003
> as a duplicate of CASSANDRA-11513.
>
> On Tue, Jun 14, 2016 at 4:21 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> Joel,
>>
>> Thanks for your reply, I have checked and found that the behavior is same
>> in case of CASSANDRA-11513
>> <https://issues.apache.org/jira/browse/CASSANDRA-11513>. I have verified
>> this behavior (for both 11513 & 12003) to occur in case of 3.4 & 3.5. They
>> both don't occur in 3.0.4, 3.6 & 3.7.
>>
>> Please find below the results of selecting only pk and clustering key
>> from 11513. It has also been verified that both issues occur while
>> selecting all / filtered rows therefore selection criteria is not an issue
>> filtering by WHERE is:
>>
>> cqlsh:ks> select pk,a from test0 where pk=0 and a=2;
>>
>>  pk | a
>> ----+---
>>   0 | 1
>>   0 | 2
>>   0 | 3
>>
>> We can verify this claim by applying 11513 Patch to 3.5 Tag and build &
>> test for 12003. If it is fixed then we can guarantee the claim. Let me
>> know if any further input may possibly be required here.
>>
>> On Wed, Jun 15, 2016 at 2:23 AM, Joel Knighton <
>> joel.knighton@datastax.com> wrote:
>>
>>> The important part of that query is that it's selecting a static column
>>> (with select *), not whether it is filtering on one. In CASSANDRA-12003 and
>>> this thread, it looks like you're only selecting the primary and clustering
>>> columns. I'd be cautious about concluding that CASSANDRA-12003 and
>>> CASSANDRA-11513 are the same issue and that CASSANDRA-12003 is fixed.
>>>
>>> If you have a reproduction path for CASSANDRA-12003, I'd recommend
>>> attaching it to a ticket, and someone can investigate internals to see if
>>> CASSANDRA-11513 (or something else entirely) fixed the issue.
>>>
>>> On Tue, Jun 14, 2016 at 2:13 PM, Bhuvan Rawal <bh...@gmail.com>
>>> wrote:
>>>
>>>> Joel,
>>>>
>>>> If we look at the schema carefully:
>>>>
>>>> CREATE TABLE test0 (
>>>>     pk int,
>>>>     a int,
>>>>     b text,
>>>>     s text static,
>>>>     PRIMARY KEY (*pk, a)*
>>>> );
>>>>
>>>> and filtering is performed on clustering column a and its not a static
>>>> column:
>>>>
>>>> select * from test0 where pk=0 and a=2;
>>>>
>>>>
>>>>
>>>> On Wed, Jun 15, 2016 at 12:39 AM, Joel Knighton <
>>>> joel.knighton@datastax.com> wrote:
>>>>
>>>>> It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on
>>>>> you selecting a static column, which you weren't doing in the reported
>>>>> issue. That said, I haven't looked too closely.
>>>>>
>>>>> On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal <bh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I can reproduce CASSANDRA-11513
>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on
>>>>>> 3.5, possible duplicate.
>>>>>>
>>>>>> On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <
>>>>>> joel.knighton@datastax.com> wrote:
>>>>>>
>>>>>>> There's some precedent for similar issues with static columns in 3.5
>>>>>>> with https://issues.apache.org/jira/browse/CASSANDRA-11513 - a
>>>>>>> deterministic (or somewhat deterministic) path for reproduction would help
>>>>>>> narrow the issue down farther. I've played around locally with similar
>>>>>>> schemas (sans the stratio indices) and couldn't reproduce the issue.
>>>>>>>
>>>>>>> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bh...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Jira CASSANDRA-12003
>>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been
>>>>>>>> created for the same.
>>>>>>>>
>>>>>>>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <
>>>>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Tyler,
>>>>>>>>>
>>>>>>>>> This issue is mainly visible for tables having static columns,
>>>>>>>>> still investigating.
>>>>>>>>> We will try to test after removing lucene index but I don’t think
>>>>>>>>> this plug-in could led to change in behaviour of cassandra write to table's
>>>>>>>>> memtable.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>>>>> Atul Saroha
>>>>>>>>> *Lead Software Engineer*
>>>>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>>>>
>>>>>>>>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Is 'id' your partition key? I'm not familiar with the stratio
>>>>>>>>>> indexes, but it looks like the primary key columns are both indexed.
>>>>>>>>>> Perhaps this is related?
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <
>>>>>>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> After further debug, this issue is found in in-memory memtable
>>>>>>>>>>> as doing nodetool flush + compact resolve the issue. And there is no batch
>>>>>>>>>>> write used for this table which is showing issue.
>>>>>>>>>>> Table properties:
>>>>>>>>>>>
>>>>>>>>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>>>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>>>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>>>>>>>>     AND comment = ''
>>>>>>>>>>>>     AND compaction = {'class':
>>>>>>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>>>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>>>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>>>>>>     AND crc_check_chance = 1.0
>>>>>>>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>>>>>>>     AND default_time_to_live = 0
>>>>>>>>>>>>     AND gc_grace_seconds = 864000
>>>>>>>>>>>>     AND max_index_interval = 2048
>>>>>>>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>>>>>>>     AND min_index_interval = 128
>>>>>>>>>>>>     AND read_repair_chance = 0.0
>>>>>>>>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>>>>>>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>>>>>>>>>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>>>>>>>>>>>> '1', 'schema': '{
>>>>>>>>>>>>         fields : {
>>>>>>>>>>>>             id  : {type : "bigint"},
>>>>>>>>>>>>             f_d_name : {
>>>>>>>>>>>>                 type           : "string",
>>>>>>>>>>>>                 indexed        : true,
>>>>>>>>>>>>                 sorted         : false,
>>>>>>>>>>>>                 validated      : true,
>>>>>>>>>>>>                 case_sensitive : false
>>>>>>>>>>>>             }
>>>>>>>>>>>>         }
>>>>>>>>>>>>     }'};
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>>>>>>> Atul Saroha
>>>>>>>>>>> *Lead Software Engineer*
>>>>>>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>>>>>>>>>>> verma.siddharth@snapdeal.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> No, all rows were not the same.
>>>>>>>>>>>> Querying only on the partition key gives 20 rows.
>>>>>>>>>>>> In the erroneous result, while querying on partition key and
>>>>>>>>>>>> clustering key, we got 16 of those 20 rows.
>>>>>>>>>>>>
>>>>>>>>>>>> And for "*tombstone_threshold"* there isn't any entry at
>>>>>>>>>>>> column family level.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Siddharth Verma
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Tyler Hobbs
>>>>>>>>>> DataStax <http://datastax.com/>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> <http://www.datastax.com/>
>>>>>>>
>>>>>>> Joel Knighton
>>>>>>> Cassandra Developer | joel.knighton@datastax.com
>>>>>>>
>>>>>>> <https://www.linkedin.com/company/datastax>
>>>>>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>>>>>> <https://plus.google.com/+Datastax/about>
>>>>>>> <http://feeds.feedburner.com/datastax>
>>>>>>> <https://github.com/datastax/>
>>>>>>>
>>>>>>> <http://cassandrasummit.org/Email_Signature>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> <http://www.datastax.com/>
>>>>>
>>>>> Joel Knighton
>>>>> Cassandra Developer | joel.knighton@datastax.com
>>>>>
>>>>> <https://www.linkedin.com/company/datastax>
>>>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>>>> <https://plus.google.com/+Datastax/about>
>>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>>
>>>>> <http://cassandrasummit.org/Email_Signature>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> <http://www.datastax.com/>
>>>
>>> Joel Knighton
>>> Cassandra Developer | joel.knighton@datastax.com
>>>
>>> <https://www.linkedin.com/company/datastax>
>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>
>>> <http://cassandrasummit.org/Email_Signature>
>>>
>>
>>
>
>
> --
>
> <http://www.datastax.com/>
>
> Joel Knighton
> Cassandra Developer | joel.knighton@datastax.com
>
> <https://www.linkedin.com/company/datastax>
> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>
> <http://cassandrasummit.org/Email_Signature>
>

Re: select query on entire primary key returning more than one row in result

Posted by Joel Knighton <jo...@datastax.com>.

Great work, Bhuvan - I sat down after work to look at this more carefully.

For a short summary, you are correct.

For a longer summary, I initially thought the reproduction you provided
would not run into the issue from 3.4/3.5 because it didn't select any
static columns, which meant that it wouldn't have statics in its
ColumnFilter (basically, the filter we apply when deciding if we need to
look for the requested data in more SSTables). This was an incorrect
understanding - in order to preserve the CQL semantic (see CASSANDRA-6588
for details), we are including all columns, including the static columns,
in the fetched columns, which means they are part of the ColumnFilter. I
believe there may be an opportunity for an optimization here, but that's a
whole different discussion. I now agree that these are the same issue.

You are correct in your analysis that 3.4/3.5 are the only affected
versions. It has been patched in release 3.6 forward and was not introduced
until 3.4

Thanks for sticking with me on this - I'm going to resolve CASSANDRA-12003
as a duplicate of CASSANDRA-11513.

On Tue, Jun 14, 2016 at 4:21 PM, Bhuvan Rawal <bh...@gmail.com> wrote:

> Joel,
>
> Thanks for your reply, I have checked and found that the behavior is same
> in case of CASSANDRA-11513
> <https://issues.apache.org/jira/browse/CASSANDRA-11513>. I have verified
> this behavior (for both 11513 & 12003) to occur in case of 3.4 & 3.5. They
> both don't occur in 3.0.4, 3.6 & 3.7.
>
> Please find below the results of selecting only pk and clustering key from 11513.
> It has also been verified that both issues occur while selecting all /
> filtered rows therefore selection criteria is not an issue filtering by
> WHERE is:
>
> cqlsh:ks> select pk,a from test0 where pk=0 and a=2;
>
>  pk | a
> ----+---
>   0 | 1
>   0 | 2
>   0 | 3
>
> We can verify this claim by applying 11513 Patch to 3.5 Tag and build &
> test for 12003. If it is fixed then we can guarantee the claim. Let me
> know if any further input may possibly be required here.
>
> On Wed, Jun 15, 2016 at 2:23 AM, Joel Knighton <joel.knighton@datastax.com
> > wrote:
>
>> The important part of that query is that it's selecting a static column
>> (with select *), not whether it is filtering on one. In CASSANDRA-12003 and
>> this thread, it looks like you're only selecting the primary and clustering
>> columns. I'd be cautious about concluding that CASSANDRA-12003 and
>> CASSANDRA-11513 are the same issue and that CASSANDRA-12003 is fixed.
>>
>> If you have a reproduction path for CASSANDRA-12003, I'd recommend
>> attaching it to a ticket, and someone can investigate internals to see if
>> CASSANDRA-11513 (or something else entirely) fixed the issue.
>>
>> On Tue, Jun 14, 2016 at 2:13 PM, Bhuvan Rawal <bh...@gmail.com>
>> wrote:
>>
>>> Joel,
>>>
>>> If we look at the schema carefully:
>>>
>>> CREATE TABLE test0 (
>>>     pk int,
>>>     a int,
>>>     b text,
>>>     s text static,
>>>     PRIMARY KEY (*pk, a)*
>>> );
>>>
>>> and filtering is performed on clustering column a and its not a static
>>> column:
>>>
>>> select * from test0 where pk=0 and a=2;
>>>
>>>
>>>
>>> On Wed, Jun 15, 2016 at 12:39 AM, Joel Knighton <
>>> joel.knighton@datastax.com> wrote:
>>>
>>>> It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on
>>>> you selecting a static column, which you weren't doing in the reported
>>>> issue. That said, I haven't looked too closely.
>>>>
>>>> On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal <bh...@gmail.com>
>>>> wrote:
>>>>
>>>>> I can reproduce CASSANDRA-11513
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on
>>>>> 3.5, possible duplicate.
>>>>>
>>>>> On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <
>>>>> joel.knighton@datastax.com> wrote:
>>>>>
>>>>>> There's some precedent for similar issues with static columns in 3.5
>>>>>> with https://issues.apache.org/jira/browse/CASSANDRA-11513 - a
>>>>>> deterministic (or somewhat deterministic) path for reproduction would help
>>>>>> narrow the issue down farther. I've played around locally with similar
>>>>>> schemas (sans the stratio indices) and couldn't reproduce the issue.
>>>>>>
>>>>>> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bh...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Jira CASSANDRA-12003
>>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been
>>>>>>> created for the same.
>>>>>>>
>>>>>>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <
>>>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>>>
>>>>>>>> Hi Tyler,
>>>>>>>>
>>>>>>>> This issue is mainly visible for tables having static columns,
>>>>>>>> still investigating.
>>>>>>>> We will try to test after removing lucene index but I don’t think
>>>>>>>> this plug-in could led to change in behaviour of cassandra write to table's
>>>>>>>> memtable.
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>>>> Atul Saroha
>>>>>>>> *Lead Software Engineer*
>>>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>>>
>>>>>>>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Is 'id' your partition key? I'm not familiar with the stratio
>>>>>>>>> indexes, but it looks like the primary key columns are both indexed.
>>>>>>>>> Perhaps this is related?
>>>>>>>>>
>>>>>>>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <
>>>>>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>>>>>
>>>>>>>>>> After further debug, this issue is found in in-memory memtable as
>>>>>>>>>> doing nodetool flush + compact resolve the issue. And there is no batch
>>>>>>>>>> write used for this table which is showing issue.
>>>>>>>>>> Table properties:
>>>>>>>>>>
>>>>>>>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>>>>>>>     AND comment = ''
>>>>>>>>>>>     AND compaction = {'class':
>>>>>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>>>>>     AND crc_check_chance = 1.0
>>>>>>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>>>>>>     AND default_time_to_live = 0
>>>>>>>>>>>     AND gc_grace_seconds = 864000
>>>>>>>>>>>     AND max_index_interval = 2048
>>>>>>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>>>>>>     AND min_index_interval = 128
>>>>>>>>>>>     AND read_repair_chance = 0.0
>>>>>>>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>>>>>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>>>>>>>>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>>>>>>>>>>> '1', 'schema': '{
>>>>>>>>>>>         fields : {
>>>>>>>>>>>             id  : {type : "bigint"},
>>>>>>>>>>>             f_d_name : {
>>>>>>>>>>>                 type           : "string",
>>>>>>>>>>>                 indexed        : true,
>>>>>>>>>>>                 sorted         : false,
>>>>>>>>>>>                 validated      : true,
>>>>>>>>>>>                 case_sensitive : false
>>>>>>>>>>>             }
>>>>>>>>>>>         }
>>>>>>>>>>>     }'};
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>>>>>> Atul Saroha
>>>>>>>>>> *Lead Software Engineer*
>>>>>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>>>>>
>>>>>>>>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>>>>>>>>>> verma.siddharth@snapdeal.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> No, all rows were not the same.
>>>>>>>>>>> Querying only on the partition key gives 20 rows.
>>>>>>>>>>> In the erroneous result, while querying on partition key and
>>>>>>>>>>> clustering key, we got 16 of those 20 rows.
>>>>>>>>>>>
>>>>>>>>>>> And for "*tombstone_threshold"* there isn't any entry at column
>>>>>>>>>>> family level.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Siddharth Verma
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Tyler Hobbs
>>>>>>>>> DataStax <http://datastax.com/>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> <http://www.datastax.com/>
>>>>>>
>>>>>> Joel Knighton
>>>>>> Cassandra Developer | joel.knighton@datastax.com
>>>>>>
>>>>>> <https://www.linkedin.com/company/datastax>
>>>>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>>>>> <https://plus.google.com/+Datastax/about>
>>>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>>>
>>>>>> <http://cassandrasummit.org/Email_Signature>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> <http://www.datastax.com/>
>>>>
>>>> Joel Knighton
>>>> Cassandra Developer | joel.knighton@datastax.com
>>>>
>>>> <https://www.linkedin.com/company/datastax>
>>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>>> <https://plus.google.com/+Datastax/about>
>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>
>>>> <http://cassandrasummit.org/Email_Signature>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> <http://www.datastax.com/>
>>
>> Joel Knighton
>> Cassandra Developer | joel.knighton@datastax.com
>>
>> <https://www.linkedin.com/company/datastax>
>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>
>> <http://cassandrasummit.org/Email_Signature>
>>
>
>


-- 

<http://www.datastax.com/>

Joel Knighton
Cassandra Developer | joel.knighton@datastax.com

<https://www.linkedin.com/company/datastax>
<https://www.facebook.com/datastax> <https://twitter.com/datastax>
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax> <https://github.com/datastax/>

<http://cassandrasummit.org/Email_Signature>

Re: select query on entire primary key returning more than one row in result

Posted by Bhuvan Rawal <bh...@gmail.com>.

Joel,

Thanks for your reply, I have checked and found that the behavior is same
in case of CASSANDRA-11513
<https://issues.apache.org/jira/browse/CASSANDRA-11513>. I have verified
this behavior (for both 11513 & 12003) to occur in case of 3.4 & 3.5. They
both don't occur in 3.0.4, 3.6 & 3.7.

Please find below the results of selecting only pk and clustering key
from 11513.
It has also been verified that both issues occur while selecting all /
filtered rows therefore selection criteria is not an issue filtering by
WHERE is:

cqlsh:ks> select pk,a from test0 where pk=0 and a=2;

 pk | a
----+---
  0 | 1
  0 | 2
  0 | 3

We can verify this claim by applying 11513 Patch to 3.5 Tag and build &
test for 12003. If it is fixed then we can guarantee the claim. Let me know
if any further input may possibly be required here.

On Wed, Jun 15, 2016 at 2:23 AM, Joel Knighton <jo...@datastax.com>
wrote:

> The important part of that query is that it's selecting a static column
> (with select *), not whether it is filtering on one. In CASSANDRA-12003 and
> this thread, it looks like you're only selecting the primary and clustering
> columns. I'd be cautious about concluding that CASSANDRA-12003 and
> CASSANDRA-11513 are the same issue and that CASSANDRA-12003 is fixed.
>
> If you have a reproduction path for CASSANDRA-12003, I'd recommend
> attaching it to a ticket, and someone can investigate internals to see if
> CASSANDRA-11513 (or something else entirely) fixed the issue.
>
> On Tue, Jun 14, 2016 at 2:13 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> Joel,
>>
>> If we look at the schema carefully:
>>
>> CREATE TABLE test0 (
>>     pk int,
>>     a int,
>>     b text,
>>     s text static,
>>     PRIMARY KEY (*pk, a)*
>> );
>>
>> and filtering is performed on clustering column a and its not a static
>> column:
>>
>> select * from test0 where pk=0 and a=2;
>>
>>
>>
>> On Wed, Jun 15, 2016 at 12:39 AM, Joel Knighton <
>> joel.knighton@datastax.com> wrote:
>>
>>> It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on you
>>> selecting a static column, which you weren't doing in the reported issue.
>>> That said, I haven't looked too closely.
>>>
>>> On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal <bh...@gmail.com>
>>> wrote:
>>>
>>>> I can reproduce CASSANDRA-11513
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on
>>>> 3.5, possible duplicate.
>>>>
>>>> On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <
>>>> joel.knighton@datastax.com> wrote:
>>>>
>>>>> There's some precedent for similar issues with static columns in 3.5
>>>>> with https://issues.apache.org/jira/browse/CASSANDRA-11513 - a
>>>>> deterministic (or somewhat deterministic) path for reproduction would help
>>>>> narrow the issue down farther. I've played around locally with similar
>>>>> schemas (sans the stratio indices) and couldn't reproduce the issue.
>>>>>
>>>>> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bh...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Jira CASSANDRA-12003
>>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been
>>>>>> created for the same.
>>>>>>
>>>>>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <
>>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>>
>>>>>>> Hi Tyler,
>>>>>>>
>>>>>>> This issue is mainly visible for tables having static columns, still
>>>>>>> investigating.
>>>>>>> We will try to test after removing lucene index but I don’t think
>>>>>>> this plug-in could led to change in behaviour of cassandra write to table's
>>>>>>> memtable.
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>>> Atul Saroha
>>>>>>> *Lead Software Engineer*
>>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>>
>>>>>>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Is 'id' your partition key? I'm not familiar with the stratio
>>>>>>>> indexes, but it looks like the primary key columns are both indexed.
>>>>>>>> Perhaps this is related?
>>>>>>>>
>>>>>>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <
>>>>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>>>>
>>>>>>>>> After further debug, this issue is found in in-memory memtable as
>>>>>>>>> doing nodetool flush + compact resolve the issue. And there is no batch
>>>>>>>>> write used for this table which is showing issue.
>>>>>>>>> Table properties:
>>>>>>>>>
>>>>>>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>>>>>>     AND comment = ''
>>>>>>>>>>     AND compaction = {'class':
>>>>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>>>>     AND crc_check_chance = 1.0
>>>>>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>>>>>     AND default_time_to_live = 0
>>>>>>>>>>     AND gc_grace_seconds = 864000
>>>>>>>>>>     AND max_index_interval = 2048
>>>>>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>>>>>     AND min_index_interval = 128
>>>>>>>>>>     AND read_repair_chance = 0.0
>>>>>>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>>>>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>>>>>>>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>>>>>>>>>> '1', 'schema': '{
>>>>>>>>>>         fields : {
>>>>>>>>>>             id  : {type : "bigint"},
>>>>>>>>>>             f_d_name : {
>>>>>>>>>>                 type           : "string",
>>>>>>>>>>                 indexed        : true,
>>>>>>>>>>                 sorted         : false,
>>>>>>>>>>                 validated      : true,
>>>>>>>>>>                 case_sensitive : false
>>>>>>>>>>             }
>>>>>>>>>>         }
>>>>>>>>>>     }'};
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>>>>> Atul Saroha
>>>>>>>>> *Lead Software Engineer*
>>>>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>>>>
>>>>>>>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>>>>>>>>> verma.siddharth@snapdeal.com> wrote:
>>>>>>>>>
>>>>>>>>>> No, all rows were not the same.
>>>>>>>>>> Querying only on the partition key gives 20 rows.
>>>>>>>>>> In the erroneous result, while querying on partition key and
>>>>>>>>>> clustering key, we got 16 of those 20 rows.
>>>>>>>>>>
>>>>>>>>>> And for "*tombstone_threshold"* there isn't any entry at column
>>>>>>>>>> family level.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Siddharth Verma
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Tyler Hobbs
>>>>>>>> DataStax <http://datastax.com/>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> <http://www.datastax.com/>
>>>>>
>>>>> Joel Knighton
>>>>> Cassandra Developer | joel.knighton@datastax.com
>>>>>
>>>>> <https://www.linkedin.com/company/datastax>
>>>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>>>> <https://plus.google.com/+Datastax/about>
>>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>>
>>>>> <http://cassandrasummit.org/Email_Signature>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> <http://www.datastax.com/>
>>>
>>> Joel Knighton
>>> Cassandra Developer | joel.knighton@datastax.com
>>>
>>> <https://www.linkedin.com/company/datastax>
>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>
>>> <http://cassandrasummit.org/Email_Signature>
>>>
>>
>>
>
>
> --
>
> <http://www.datastax.com/>
>
> Joel Knighton
> Cassandra Developer | joel.knighton@datastax.com
>
> <https://www.linkedin.com/company/datastax>
> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>
> <http://cassandrasummit.org/Email_Signature>
>

Re: select query on entire primary key returning more than one row in result

Posted by Joel Knighton <jo...@datastax.com>.

The important part of that query is that it's selecting a static column
(with select *), not whether it is filtering on one. In CASSANDRA-12003 and
this thread, it looks like you're only selecting the primary and clustering
columns. I'd be cautious about concluding that CASSANDRA-12003 and
CASSANDRA-11513 are the same issue and that CASSANDRA-12003 is fixed.

If you have a reproduction path for CASSANDRA-12003, I'd recommend
attaching it to a ticket, and someone can investigate internals to see if
CASSANDRA-11513 (or something else entirely) fixed the issue.

On Tue, Jun 14, 2016 at 2:13 PM, Bhuvan Rawal <bh...@gmail.com> wrote:

> Joel,
>
> If we look at the schema carefully:
>
> CREATE TABLE test0 (
>     pk int,
>     a int,
>     b text,
>     s text static,
>     PRIMARY KEY (*pk, a)*
> );
>
> and filtering is performed on clustering column a and its not a static
> column:
>
> select * from test0 where pk=0 and a=2;
>
>
>
> On Wed, Jun 15, 2016 at 12:39 AM, Joel Knighton <
> joel.knighton@datastax.com> wrote:
>
>> It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on you
>> selecting a static column, which you weren't doing in the reported issue.
>> That said, I haven't looked too closely.
>>
>> On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal <bh...@gmail.com>
>> wrote:
>>
>>> I can reproduce CASSANDRA-11513
>>> <https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on 3.5,
>>> possible duplicate.
>>>
>>> On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <
>>> joel.knighton@datastax.com> wrote:
>>>
>>>> There's some precedent for similar issues with static columns in 3.5
>>>> with https://issues.apache.org/jira/browse/CASSANDRA-11513 - a
>>>> deterministic (or somewhat deterministic) path for reproduction would help
>>>> narrow the issue down farther. I've played around locally with similar
>>>> schemas (sans the stratio indices) and couldn't reproduce the issue.
>>>>
>>>> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jira CASSANDRA-12003
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been
>>>>> created for the same.
>>>>>
>>>>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <
>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>
>>>>>> Hi Tyler,
>>>>>>
>>>>>> This issue is mainly visible for tables having static columns, still
>>>>>> investigating.
>>>>>> We will try to test after removing lucene index but I don’t think
>>>>>> this plug-in could led to change in behaviour of cassandra write to table's
>>>>>> memtable.
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>> Atul Saroha
>>>>>> *Lead Software Engineer*
>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>
>>>>>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Is 'id' your partition key? I'm not familiar with the stratio
>>>>>>> indexes, but it looks like the primary key columns are both indexed.
>>>>>>> Perhaps this is related?
>>>>>>>
>>>>>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <
>>>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>>>
>>>>>>>> After further debug, this issue is found in in-memory memtable as
>>>>>>>> doing nodetool flush + compact resolve the issue. And there is no batch
>>>>>>>> write used for this table which is showing issue.
>>>>>>>> Table properties:
>>>>>>>>
>>>>>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>>>>>     AND comment = ''
>>>>>>>>>     AND compaction = {'class':
>>>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>>>     AND crc_check_chance = 1.0
>>>>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>>>>     AND default_time_to_live = 0
>>>>>>>>>     AND gc_grace_seconds = 864000
>>>>>>>>>     AND max_index_interval = 2048
>>>>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>>>>     AND min_index_interval = 128
>>>>>>>>>     AND read_repair_chance = 0.0
>>>>>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>>>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>>>>>>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>>>>>>>>> '1', 'schema': '{
>>>>>>>>>         fields : {
>>>>>>>>>             id  : {type : "bigint"},
>>>>>>>>>             f_d_name : {
>>>>>>>>>                 type           : "string",
>>>>>>>>>                 indexed        : true,
>>>>>>>>>                 sorted         : false,
>>>>>>>>>                 validated      : true,
>>>>>>>>>                 case_sensitive : false
>>>>>>>>>             }
>>>>>>>>>         }
>>>>>>>>>     }'};
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>>>> Atul Saroha
>>>>>>>> *Lead Software Engineer*
>>>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>>>
>>>>>>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>>>>>>>> verma.siddharth@snapdeal.com> wrote:
>>>>>>>>
>>>>>>>>> No, all rows were not the same.
>>>>>>>>> Querying only on the partition key gives 20 rows.
>>>>>>>>> In the erroneous result, while querying on partition key and
>>>>>>>>> clustering key, we got 16 of those 20 rows.
>>>>>>>>>
>>>>>>>>> And for "*tombstone_threshold"* there isn't any entry at column
>>>>>>>>> family level.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Siddharth Verma
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Tyler Hobbs
>>>>>>> DataStax <http://datastax.com/>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> <http://www.datastax.com/>
>>>>
>>>> Joel Knighton
>>>> Cassandra Developer | joel.knighton@datastax.com
>>>>
>>>> <https://www.linkedin.com/company/datastax>
>>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>>> <https://plus.google.com/+Datastax/about>
>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>
>>>> <http://cassandrasummit.org/Email_Signature>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> <http://www.datastax.com/>
>>
>> Joel Knighton
>> Cassandra Developer | joel.knighton@datastax.com
>>
>> <https://www.linkedin.com/company/datastax>
>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>
>> <http://cassandrasummit.org/Email_Signature>
>>
>
>


-- 

<http://www.datastax.com/>

Joel Knighton
Cassandra Developer | joel.knighton@datastax.com

<https://www.linkedin.com/company/datastax>
<https://www.facebook.com/datastax> <https://twitter.com/datastax>
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax> <https://github.com/datastax/>

<http://cassandrasummit.org/Email_Signature>

Re: select query on entire primary key returning more than one row in result

Posted by Bhuvan Rawal <bh...@gmail.com>.

I have verified this issue to be fixed in 3.6 and 3.7.
And the issue mentioned on this thread is fixed as well.

On Wed, Jun 15, 2016 at 12:43 AM, Bhuvan Rawal <bh...@gmail.com> wrote:

> Joel,
>
> If we look at the schema carefully:
>
> CREATE TABLE test0 (
>     pk int,
>     a int,
>     b text,
>     s text static,
>     PRIMARY KEY (*pk, a)*
> );
>
> and filtering is performed on clustering column a and its not a static
> column:
>
> select * from test0 where pk=0 and a=2;
>
>
>
> On Wed, Jun 15, 2016 at 12:39 AM, Joel Knighton <
> joel.knighton@datastax.com> wrote:
>
>> It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on you
>> selecting a static column, which you weren't doing in the reported issue.
>> That said, I haven't looked too closely.
>>
>> On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal <bh...@gmail.com>
>> wrote:
>>
>>> I can reproduce CASSANDRA-11513
>>> <https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on 3.5,
>>> possible duplicate.
>>>
>>> On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <
>>> joel.knighton@datastax.com> wrote:
>>>
>>>> There's some precedent for similar issues with static columns in 3.5
>>>> with https://issues.apache.org/jira/browse/CASSANDRA-11513 - a
>>>> deterministic (or somewhat deterministic) path for reproduction would help
>>>> narrow the issue down farther. I've played around locally with similar
>>>> schemas (sans the stratio indices) and couldn't reproduce the issue.
>>>>
>>>> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bh...@gmail.com>
>>>> wrote:
>>>>
>>>>> Jira CASSANDRA-12003
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been
>>>>> created for the same.
>>>>>
>>>>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <
>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>
>>>>>> Hi Tyler,
>>>>>>
>>>>>> This issue is mainly visible for tables having static columns, still
>>>>>> investigating.
>>>>>> We will try to test after removing lucene index but I don’t think
>>>>>> this plug-in could led to change in behaviour of cassandra write to table's
>>>>>> memtable.
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>> Atul Saroha
>>>>>> *Lead Software Engineer*
>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>
>>>>>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Is 'id' your partition key? I'm not familiar with the stratio
>>>>>>> indexes, but it looks like the primary key columns are both indexed.
>>>>>>> Perhaps this is related?
>>>>>>>
>>>>>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <
>>>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>>>
>>>>>>>> After further debug, this issue is found in in-memory memtable as
>>>>>>>> doing nodetool flush + compact resolve the issue. And there is no batch
>>>>>>>> write used for this table which is showing issue.
>>>>>>>> Table properties:
>>>>>>>>
>>>>>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>>>>>     AND comment = ''
>>>>>>>>>     AND compaction = {'class':
>>>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>>>     AND crc_check_chance = 1.0
>>>>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>>>>     AND default_time_to_live = 0
>>>>>>>>>     AND gc_grace_seconds = 864000
>>>>>>>>>     AND max_index_interval = 2048
>>>>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>>>>     AND min_index_interval = 128
>>>>>>>>>     AND read_repair_chance = 0.0
>>>>>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>>>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>>>>>>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>>>>>>>>> '1', 'schema': '{
>>>>>>>>>         fields : {
>>>>>>>>>             id  : {type : "bigint"},
>>>>>>>>>             f_d_name : {
>>>>>>>>>                 type           : "string",
>>>>>>>>>                 indexed        : true,
>>>>>>>>>                 sorted         : false,
>>>>>>>>>                 validated      : true,
>>>>>>>>>                 case_sensitive : false
>>>>>>>>>             }
>>>>>>>>>         }
>>>>>>>>>     }'};
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>>>> Atul Saroha
>>>>>>>> *Lead Software Engineer*
>>>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>>>
>>>>>>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>>>>>>>> verma.siddharth@snapdeal.com> wrote:
>>>>>>>>
>>>>>>>>> No, all rows were not the same.
>>>>>>>>> Querying only on the partition key gives 20 rows.
>>>>>>>>> In the erroneous result, while querying on partition key and
>>>>>>>>> clustering key, we got 16 of those 20 rows.
>>>>>>>>>
>>>>>>>>> And for "*tombstone_threshold"* there isn't any entry at column
>>>>>>>>> family level.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Siddharth Verma
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Tyler Hobbs
>>>>>>> DataStax <http://datastax.com/>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> <http://www.datastax.com/>
>>>>
>>>> Joel Knighton
>>>> Cassandra Developer | joel.knighton@datastax.com
>>>>
>>>> <https://www.linkedin.com/company/datastax>
>>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>>> <https://plus.google.com/+Datastax/about>
>>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>>
>>>> <http://cassandrasummit.org/Email_Signature>
>>>>
>>>
>>>
>>
>>
>> --
>>
>> <http://www.datastax.com/>
>>
>> Joel Knighton
>> Cassandra Developer | joel.knighton@datastax.com
>>
>> <https://www.linkedin.com/company/datastax>
>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>
>> <http://cassandrasummit.org/Email_Signature>
>>
>
>

Re: select query on entire primary key returning more than one row in result

Posted by Bhuvan Rawal <bh...@gmail.com>.

Joel,

If we look at the schema carefully:

CREATE TABLE test0 (
    pk int,
    a int,
    b text,
    s text static,
    PRIMARY KEY (*pk, a)*
);

and filtering is performed on clustering column a and its not a static
column:

select * from test0 where pk=0 and a=2;



On Wed, Jun 15, 2016 at 12:39 AM, Joel Knighton <jo...@datastax.com>
wrote:

> It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on you
> selecting a static column, which you weren't doing in the reported issue.
> That said, I haven't looked too closely.
>
> On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> I can reproduce CASSANDRA-11513
>> <https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on 3.5,
>> possible duplicate.
>>
>> On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <
>> joel.knighton@datastax.com> wrote:
>>
>>> There's some precedent for similar issues with static columns in 3.5
>>> with https://issues.apache.org/jira/browse/CASSANDRA-11513 - a
>>> deterministic (or somewhat deterministic) path for reproduction would help
>>> narrow the issue down farther. I've played around locally with similar
>>> schemas (sans the stratio indices) and couldn't reproduce the issue.
>>>
>>> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bh...@gmail.com>
>>> wrote:
>>>
>>>> Jira CASSANDRA-12003
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been
>>>> created for the same.
>>>>
>>>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <atul.saroha@snapdeal.com
>>>> > wrote:
>>>>
>>>>> Hi Tyler,
>>>>>
>>>>> This issue is mainly visible for tables having static columns, still
>>>>> investigating.
>>>>> We will try to test after removing lucene index but I don’t think this
>>>>> plug-in could led to change in behaviour of cassandra write to table's
>>>>> memtable.
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>> Atul Saroha
>>>>> *Lead Software Engineer*
>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>
>>>>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com>
>>>>> wrote:
>>>>>
>>>>>> Is 'id' your partition key? I'm not familiar with the stratio
>>>>>> indexes, but it looks like the primary key columns are both indexed.
>>>>>> Perhaps this is related?
>>>>>>
>>>>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <
>>>>>> atul.saroha@snapdeal.com> wrote:
>>>>>>
>>>>>>> After further debug, this issue is found in in-memory memtable as
>>>>>>> doing nodetool flush + compact resolve the issue. And there is no batch
>>>>>>> write used for this table which is showing issue.
>>>>>>> Table properties:
>>>>>>>
>>>>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>>>>     AND comment = ''
>>>>>>>>     AND compaction = {'class':
>>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>>     AND crc_check_chance = 1.0
>>>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>>>     AND default_time_to_live = 0
>>>>>>>>     AND gc_grace_seconds = 864000
>>>>>>>>     AND max_index_interval = 2048
>>>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>>>     AND min_index_interval = 128
>>>>>>>>     AND read_repair_chance = 0.0
>>>>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>>>>>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>>>>>>>> '1', 'schema': '{
>>>>>>>>         fields : {
>>>>>>>>             id  : {type : "bigint"},
>>>>>>>>             f_d_name : {
>>>>>>>>                 type           : "string",
>>>>>>>>                 indexed        : true,
>>>>>>>>                 sorted         : false,
>>>>>>>>                 validated      : true,
>>>>>>>>                 case_sensitive : false
>>>>>>>>             }
>>>>>>>>         }
>>>>>>>>     }'};
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>>> Atul Saroha
>>>>>>> *Lead Software Engineer*
>>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>>
>>>>>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>>>>>>> verma.siddharth@snapdeal.com> wrote:
>>>>>>>
>>>>>>>> No, all rows were not the same.
>>>>>>>> Querying only on the partition key gives 20 rows.
>>>>>>>> In the erroneous result, while querying on partition key and
>>>>>>>> clustering key, we got 16 of those 20 rows.
>>>>>>>>
>>>>>>>> And for "*tombstone_threshold"* there isn't any entry at column
>>>>>>>> family level.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Siddharth Verma
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Tyler Hobbs
>>>>>> DataStax <http://datastax.com/>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> <http://www.datastax.com/>
>>>
>>> Joel Knighton
>>> Cassandra Developer | joel.knighton@datastax.com
>>>
>>> <https://www.linkedin.com/company/datastax>
>>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>>> <https://plus.google.com/+Datastax/about>
>>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>>
>>> <http://cassandrasummit.org/Email_Signature>
>>>
>>
>>
>
>
> --
>
> <http://www.datastax.com/>
>
> Joel Knighton
> Cassandra Developer | joel.knighton@datastax.com
>
> <https://www.linkedin.com/company/datastax>
> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>
> <http://cassandrasummit.org/Email_Signature>
>

Re: select query on entire primary key returning more than one row in result

Posted by Joel Knighton <jo...@datastax.com>.

It doesn't seem to be an exact duplicate - CASSANDRA-11513 relies on you
selecting a static column, which you weren't doing in the reported issue.
That said, I haven't looked too closely.

On Tue, Jun 14, 2016 at 2:07 PM, Bhuvan Rawal <bh...@gmail.com> wrote:

> I can reproduce CASSANDRA-11513
> <https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on 3.5,
> possible duplicate.
>
> On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <
> joel.knighton@datastax.com> wrote:
>
>> There's some precedent for similar issues with static columns in 3.5 with
>> https://issues.apache.org/jira/browse/CASSANDRA-11513 - a deterministic
>> (or somewhat deterministic) path for reproduction would help narrow the
>> issue down farther. I've played around locally with similar schemas (sans
>> the stratio indices) and couldn't reproduce the issue.
>>
>> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bh...@gmail.com>
>> wrote:
>>
>>> Jira CASSANDRA-12003
>>> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been
>>> created for the same.
>>>
>>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <at...@snapdeal.com>
>>> wrote:
>>>
>>>> Hi Tyler,
>>>>
>>>> This issue is mainly visible for tables having static columns, still
>>>> investigating.
>>>> We will try to test after removing lucene index but I don’t think this
>>>> plug-in could led to change in behaviour of cassandra write to table's
>>>> memtable.
>>>>
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------------
>>>> Atul Saroha
>>>> *Lead Software Engineer*
>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>
>>>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com>
>>>> wrote:
>>>>
>>>>> Is 'id' your partition key? I'm not familiar with the stratio indexes,
>>>>> but it looks like the primary key columns are both indexed.  Perhaps this
>>>>> is related?
>>>>>
>>>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <atul.saroha@snapdeal.com
>>>>> > wrote:
>>>>>
>>>>>> After further debug, this issue is found in in-memory memtable as
>>>>>> doing nodetool flush + compact resolve the issue. And there is no batch
>>>>>> write used for this table which is showing issue.
>>>>>> Table properties:
>>>>>>
>>>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>>>     AND comment = ''
>>>>>>>     AND compaction = {'class':
>>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>>     AND crc_check_chance = 1.0
>>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>>     AND default_time_to_live = 0
>>>>>>>     AND gc_grace_seconds = 864000
>>>>>>>     AND max_index_interval = 2048
>>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>>     AND min_index_interval = 128
>>>>>>>     AND read_repair_chance = 0.0
>>>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>>>>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>>>>>>> '1', 'schema': '{
>>>>>>>         fields : {
>>>>>>>             id  : {type : "bigint"},
>>>>>>>             f_d_name : {
>>>>>>>                 type           : "string",
>>>>>>>                 indexed        : true,
>>>>>>>                 sorted         : false,
>>>>>>>                 validated      : true,
>>>>>>>                 case_sensitive : false
>>>>>>>             }
>>>>>>>         }
>>>>>>>     }'};
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>>> Atul Saroha
>>>>>> *Lead Software Engineer*
>>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>>
>>>>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>>>>>> verma.siddharth@snapdeal.com> wrote:
>>>>>>
>>>>>>> No, all rows were not the same.
>>>>>>> Querying only on the partition key gives 20 rows.
>>>>>>> In the erroneous result, while querying on partition key and
>>>>>>> clustering key, we got 16 of those 20 rows.
>>>>>>>
>>>>>>> And for "*tombstone_threshold"* there isn't any entry at column
>>>>>>> family level.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Siddharth Verma
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Tyler Hobbs
>>>>> DataStax <http://datastax.com/>
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>>
>> <http://www.datastax.com/>
>>
>> Joel Knighton
>> Cassandra Developer | joel.knighton@datastax.com
>>
>> <https://www.linkedin.com/company/datastax>
>> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
>> <https://plus.google.com/+Datastax/about>
>> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>>
>> <http://cassandrasummit.org/Email_Signature>
>>
>
>


-- 

<http://www.datastax.com/>

Joel Knighton
Cassandra Developer | joel.knighton@datastax.com

<https://www.linkedin.com/company/datastax>
<https://www.facebook.com/datastax> <https://twitter.com/datastax>
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax> <https://github.com/datastax/>

<http://cassandrasummit.org/Email_Signature>

Re: select query on entire primary key returning more than one row in result

Posted by Bhuvan Rawal <bh...@gmail.com>.

I can reproduce CASSANDRA-11513
<https://issues.apache.org/jira/browse/CASSANDRA-11513> locally on 3.5,
possible duplicate.

On Wed, Jun 15, 2016 at 12:29 AM, Joel Knighton <jo...@datastax.com>
wrote:

> There's some precedent for similar issues with static columns in 3.5 with
> https://issues.apache.org/jira/browse/CASSANDRA-11513 - a deterministic
> (or somewhat deterministic) path for reproduction would help narrow the
> issue down farther. I've played around locally with similar schemas (sans
> the stratio indices) and couldn't reproduce the issue.
>
> On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bh...@gmail.com> wrote:
>
>> Jira CASSANDRA-12003
>> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been created
>> for the same.
>>
>> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <at...@snapdeal.com>
>> wrote:
>>
>>> Hi Tyler,
>>>
>>> This issue is mainly visible for tables having static columns, still
>>> investigating.
>>> We will try to test after removing lucene index but I don’t think this
>>> plug-in could led to change in behaviour of cassandra write to table's
>>> memtable.
>>>
>>>
>>> ---------------------------------------------------------------------------------------------------------------------
>>> Atul Saroha
>>> *Lead Software Engineer*
>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>
>>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>>>
>>>> Is 'id' your partition key? I'm not familiar with the stratio indexes,
>>>> but it looks like the primary key columns are both indexed.  Perhaps this
>>>> is related?
>>>>
>>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <at...@snapdeal.com>
>>>> wrote:
>>>>
>>>>> After further debug, this issue is found in in-memory memtable as
>>>>> doing nodetool flush + compact resolve the issue. And there is no batch
>>>>> write used for this table which is showing issue.
>>>>> Table properties:
>>>>>
>>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>>     AND comment = ''
>>>>>>     AND compaction = {'class':
>>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>>     AND crc_check_chance = 1.0
>>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>>     AND default_time_to_live = 0
>>>>>>     AND gc_grace_seconds = 864000
>>>>>>     AND max_index_interval = 2048
>>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>>     AND min_index_interval = 128
>>>>>>     AND read_repair_chance = 0.0
>>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>>>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>>>>>> '1', 'schema': '{
>>>>>>         fields : {
>>>>>>             id  : {type : "bigint"},
>>>>>>             f_d_name : {
>>>>>>                 type           : "string",
>>>>>>                 indexed        : true,
>>>>>>                 sorted         : false,
>>>>>>                 validated      : true,
>>>>>>                 case_sensitive : false
>>>>>>             }
>>>>>>         }
>>>>>>     }'};
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------------------------------------------------------
>>>>> Atul Saroha
>>>>> *Lead Software Engineer*
>>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>>
>>>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>>>>> verma.siddharth@snapdeal.com> wrote:
>>>>>
>>>>>> No, all rows were not the same.
>>>>>> Querying only on the partition key gives 20 rows.
>>>>>> In the erroneous result, while querying on partition key and
>>>>>> clustering key, we got 16 of those 20 rows.
>>>>>>
>>>>>> And for "*tombstone_threshold"* there isn't any entry at column
>>>>>> family level.
>>>>>>
>>>>>> Thanks,
>>>>>> Siddharth Verma
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Tyler Hobbs
>>>> DataStax <http://datastax.com/>
>>>>
>>>
>>>
>>
>
>
> --
>
> <http://www.datastax.com/>
>
> Joel Knighton
> Cassandra Developer | joel.knighton@datastax.com
>
> <https://www.linkedin.com/company/datastax>
> <https://www.facebook.com/datastax> <https://twitter.com/datastax>
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax> <https://github.com/datastax/>
>
> <http://cassandrasummit.org/Email_Signature>
>

Re: select query on entire primary key returning more than one row in result

Posted by Joel Knighton <jo...@datastax.com>.

There's some precedent for similar issues with static columns in 3.5 with
https://issues.apache.org/jira/browse/CASSANDRA-11513 - a deterministic (or
somewhat deterministic) path for reproduction would help narrow the issue
down farther. I've played around locally with similar schemas (sans the
stratio indices) and couldn't reproduce the issue.

On Tue, Jun 14, 2016 at 1:41 PM, Bhuvan Rawal <bh...@gmail.com> wrote:

> Jira CASSANDRA-12003
> <https://issues.apache.org/jira/browse/CASSANDRA-12003> Has been created
> for the same.
>
> On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <at...@snapdeal.com>
> wrote:
>
>> Hi Tyler,
>>
>> This issue is mainly visible for tables having static columns, still
>> investigating.
>> We will try to test after removing lucene index but I don’t think this
>> plug-in could led to change in behaviour of cassandra write to table's
>> memtable.
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------
>> Atul Saroha
>> *Lead Software Engineer*
>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>
>> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>>
>>> Is 'id' your partition key? I'm not familiar with the stratio indexes,
>>> but it looks like the primary key columns are both indexed.  Perhaps this
>>> is related?
>>>
>>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <at...@snapdeal.com>
>>> wrote:
>>>
>>>> After further debug, this issue is found in in-memory memtable as doing
>>>> nodetool flush + compact resolve the issue. And there is no batch write
>>>> used for this table which is showing issue.
>>>> Table properties:
>>>>
>>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>>     AND bloom_filter_fp_chance = 0.01
>>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>>     AND comment = ''
>>>>>     AND compaction = {'class':
>>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>>     AND crc_check_chance = 1.0
>>>>>     AND dclocal_read_repair_chance = 0.1
>>>>>     AND default_time_to_live = 0
>>>>>     AND gc_grace_seconds = 864000
>>>>>     AND max_index_interval = 2048
>>>>>     AND memtable_flush_period_in_ms = 0
>>>>>     AND min_index_interval = 128
>>>>>     AND read_repair_chance = 0.0
>>>>>     AND speculative_retry = '99PERCENTILE';
>>>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>>>>> '1', 'schema': '{
>>>>>         fields : {
>>>>>             id  : {type : "bigint"},
>>>>>             f_d_name : {
>>>>>                 type           : "string",
>>>>>                 indexed        : true,
>>>>>                 sorted         : false,
>>>>>                 validated      : true,
>>>>>                 case_sensitive : false
>>>>>             }
>>>>>         }
>>>>>     }'};
>>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------------------------------------------------------
>>>> Atul Saroha
>>>> *Lead Software Engineer*
>>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>>
>>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>>>> verma.siddharth@snapdeal.com> wrote:
>>>>
>>>>> No, all rows were not the same.
>>>>> Querying only on the partition key gives 20 rows.
>>>>> In the erroneous result, while querying on partition key and
>>>>> clustering key, we got 16 of those 20 rows.
>>>>>
>>>>> And for "*tombstone_threshold"* there isn't any entry at column
>>>>> family level.
>>>>>
>>>>> Thanks,
>>>>> Siddharth Verma
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax <http://datastax.com/>
>>>
>>
>>
>


-- 

<http://www.datastax.com/>

Joel Knighton
Cassandra Developer | joel.knighton@datastax.com

<https://www.linkedin.com/company/datastax>
<https://www.facebook.com/datastax> <https://twitter.com/datastax>
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax> <https://github.com/datastax/>

<http://cassandrasummit.org/Email_Signature>

Re: select query on entire primary key returning more than one row in result

Posted by Bhuvan Rawal <bh...@gmail.com>.

Jira CASSANDRA-12003
<https://issues.apache.org/jira/browse/CASSANDRA-12003> Has
been created for the same.

On Tue, Jun 14, 2016 at 11:54 PM, Atul Saroha <at...@snapdeal.com>
wrote:

> Hi Tyler,
>
> This issue is mainly visible for tables having static columns, still
> investigating.
> We will try to test after removing lucene index but I don’t think this
> plug-in could led to change in behaviour of cassandra write to table's
> memtable.
>
>
> ---------------------------------------------------------------------------------------------------------------------
> Atul Saroha
> *Lead Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>
> On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>
>> Is 'id' your partition key? I'm not familiar with the stratio indexes,
>> but it looks like the primary key columns are both indexed.  Perhaps this
>> is related?
>>
>> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <at...@snapdeal.com>
>> wrote:
>>
>>> After further debug, this issue is found in in-memory memtable as doing
>>> nodetool flush + compact resolve the issue. And there is no batch write
>>> used for this table which is showing issue.
>>> Table properties:
>>>
>>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>>     AND bloom_filter_fp_chance = 0.01
>>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>>     AND comment = ''
>>>>     AND compaction = {'class':
>>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>>     AND crc_check_chance = 1.0
>>>>     AND dclocal_read_repair_chance = 0.1
>>>>     AND default_time_to_live = 0
>>>>     AND gc_grace_seconds = 864000
>>>>     AND max_index_interval = 2048
>>>>     AND memtable_flush_period_in_ms = 0
>>>>     AND min_index_interval = 128
>>>>     AND read_repair_chance = 0.0
>>>>     AND speculative_retry = '99PERCENTILE';
>>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>>>> '1', 'schema': '{
>>>>         fields : {
>>>>             id  : {type : "bigint"},
>>>>             f_d_name : {
>>>>                 type           : "string",
>>>>                 indexed        : true,
>>>>                 sorted         : false,
>>>>                 validated      : true,
>>>>                 case_sensitive : false
>>>>             }
>>>>         }
>>>>     }'};
>>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------------------------------------------------------
>>> Atul Saroha
>>> *Lead Software Engineer*
>>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>>
>>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>>> verma.siddharth@snapdeal.com> wrote:
>>>
>>>> No, all rows were not the same.
>>>> Querying only on the partition key gives 20 rows.
>>>> In the erroneous result, while querying on partition key and clustering
>>>> key, we got 16 of those 20 rows.
>>>>
>>>> And for "*tombstone_threshold"* there isn't any entry at column family
>>>> level.
>>>>
>>>> Thanks,
>>>> Siddharth Verma
>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>
>

Re: select query on entire primary key returning more than one row in result

Posted by Atul Saroha <at...@snapdeal.com>.

Hi Tyler,

This issue is mainly visible for tables having static columns, still
investigating.
We will try to test after removing lucene index but I don’t think this
plug-in could led to change in behaviour of cassandra write to table's
memtable.

---------------------------------------------------------------------------------------------------------------------
Atul Saroha
*Lead Software Engineer*
*M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA

On Tue, Jun 14, 2016 at 9:54 PM, Tyler Hobbs <ty...@datastax.com> wrote:

> Is 'id' your partition key? I'm not familiar with the stratio indexes, but
> it looks like the primary key columns are both indexed.  Perhaps this is
> related?
>
> On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <at...@snapdeal.com>
> wrote:
>
>> After further debug, this issue is found in in-memory memtable as doing
>> nodetool flush + compact resolve the issue. And there is no batch write
>> used for this table which is showing issue.
>> Table properties:
>>
>> WITH CLUSTERING ORDER BY (f_name ASC)
>>>     AND bloom_filter_fp_chance = 0.01
>>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>>     AND comment = ''
>>>     AND compaction = {'class':
>>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>>> 'max_threshold': '32', 'min_threshold': '4'}
>>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>     AND crc_check_chance = 1.0
>>>     AND dclocal_read_repair_chance = 0.1
>>>     AND default_time_to_live = 0
>>>     AND gc_grace_seconds = 864000
>>>     AND max_index_interval = 2048
>>>     AND memtable_flush_period_in_ms = 0
>>>     AND min_index_interval = 128
>>>     AND read_repair_chance = 0.0
>>>     AND speculative_retry = '99PERCENTILE';
>>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>>> '1', 'schema': '{
>>>         fields : {
>>>             id  : {type : "bigint"},
>>>             f_d_name : {
>>>                 type           : "string",
>>>                 indexed        : true,
>>>                 sorted         : false,
>>>                 validated      : true,
>>>                 case_sensitive : false
>>>             }
>>>         }
>>>     }'};
>>>
>>
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------
>> Atul Saroha
>> *Lead Software Engineer*
>> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
>> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>>
>> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
>> verma.siddharth@snapdeal.com> wrote:
>>
>>> No, all rows were not the same.
>>> Querying only on the partition key gives 20 rows.
>>> In the erroneous result, while querying on partition key and clustering
>>> key, we got 16 of those 20 rows.
>>>
>>> And for "*tombstone_threshold"* there isn't any entry at column family
>>> level.
>>>
>>> Thanks,
>>> Siddharth Verma
>>>
>>>
>>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: select query on entire primary key returning more than one row in result

Posted by Siddharth Verma <ve...@snapdeal.com>.

id is partition key,
f_name is clustering key

We weren't querying on lucene indexes.
lucene index is on id, and f_d_name (another column).


We were facing this issue on production in one column family, due to which
we had to downgrade to 3.0.3

Re: select query on entire primary key returning more than one row in result

Posted by Tyler Hobbs <ty...@datastax.com>.

Is 'id' your partition key? I'm not familiar with the stratio indexes, but
it looks like the primary key columns are both indexed.  Perhaps this is
related?

On Tue, Jun 14, 2016 at 1:25 AM, Atul Saroha <at...@snapdeal.com>
wrote:

> After further debug, this issue is found in in-memory memtable as doing
> nodetool flush + compact resolve the issue. And there is no batch write
> used for this table which is showing issue.
> Table properties:
>
> WITH CLUSTERING ORDER BY (f_name ASC)
>>     AND bloom_filter_fp_chance = 0.01
>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>     AND comment = ''
>>     AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
>> 'max_threshold': '32', 'min_threshold': '4'}
>>     AND compression = {'chunk_length_in_kb': '64', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>     AND crc_check_chance = 1.0
>>     AND dclocal_read_repair_chance = 0.1
>>     AND default_time_to_live = 0
>>     AND gc_grace_seconds = 864000
>>     AND max_index_interval = 2048
>>     AND memtable_flush_period_in_ms = 0
>>     AND min_index_interval = 128
>>     AND read_repair_chance = 0.0
>>     AND speculative_retry = '99PERCENTILE';
>> CREATE CUSTOM INDEX nbf_index ON nbf () USING
>> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
>> '1', 'schema': '{
>>         fields : {
>>             id  : {type : "bigint"},
>>             f_d_name : {
>>                 type           : "string",
>>                 indexed        : true,
>>                 sorted         : false,
>>                 validated      : true,
>>                 case_sensitive : false
>>             }
>>         }
>>     }'};
>>
>
>
>
> ---------------------------------------------------------------------------------------------------------------------
> Atul Saroha
> *Lead Software Engineer*
> *M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
> Plot # 362, ASF Centre - Tower A, Udyog Vihar,
>  Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA
>
> On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
> verma.siddharth@snapdeal.com> wrote:
>
>> No, all rows were not the same.
>> Querying only on the partition key gives 20 rows.
>> In the erroneous result, while querying on partition key and clustering
>> key, we got 16 of those 20 rows.
>>
>> And for "*tombstone_threshold"* there isn't any entry at column family
>> level.
>>
>> Thanks,
>> Siddharth Verma
>>
>>
>>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: select query on entire primary key returning more than one row in result

Posted by Atul Saroha <at...@snapdeal.com>.

After further debug, this issue is found in in-memory memtable as doing
nodetool flush + compact resolve the issue. And there is no batch write
used for this table which is showing issue.
Table properties:

WITH CLUSTERING ORDER BY (f_name ASC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>     AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX nbf_index ON nbf () USING
> 'com.stratio.cassandra.lucene.Index' WITH OPTIONS = {'refresh_seconds':
> '1', 'schema': '{
>         fields : {
>             id  : {type : "bigint"},
>             f_d_name : {
>                 type           : "string",
>                 indexed        : true,
>                 sorted         : false,
>                 validated      : true,
>                 case_sensitive : false
>             }
>         }
>     }'};
>


---------------------------------------------------------------------------------------------------------------------
Atul Saroha
*Lead Software Engineer*
*M*: +91 8447784271 *T*: +91 124-415-6069 *EXT*: 12369
Plot # 362, ASF Centre - Tower A, Udyog Vihar,
 Phase -4, Sector 18, Gurgaon, Haryana 122016, INDIA

On Mon, Jun 13, 2016 at 11:11 PM, Siddharth Verma <
verma.siddharth@snapdeal.com> wrote:

> No, all rows were not the same.
> Querying only on the partition key gives 20 rows.
> In the erroneous result, while querying on partition key and clustering
> key, we got 16 of those 20 rows.
>
> And for "*tombstone_threshold"* there isn't any entry at column family
> level.
>
> Thanks,
> Siddharth Verma
>
>
>

Re: select query on entire primary key returning more than one row in result

Posted by Siddharth Verma <ve...@snapdeal.com>.

No, all rows were not the same.
Querying only on the partition key gives 20 rows.
In the erroneous result, while querying on partition key and clustering
key, we got 16 of those 20 rows.

And for "*tombstone_threshold"* there isn't any entry at column family
level.

Thanks,
Siddharth Verma

Re: select query on entire primary key returning more than one row in result

Posted by Anshu Vajpayee <an...@gmail.com>.

were all rows same? If not what was different ?

What was droppable tombstone  compaction  ratio for that table/CF?

On Mon, Jun 13, 2016 at 6:11 PM, Siddharth Verma <
verma.siddharth@snapdeal.com> wrote:

> Running nodetool compact fixed the issue.
>
> Could someone help out as why it occurred.
>
>
>

-- 
*Regards,*
*Anshu *

Re: select query on entire primary key returning more than one row in result

Posted by Siddharth Verma <ve...@snapdeal.com>.

Running nodetool compact fixed the issue.

Could someone help out as why it occurred.