You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "julien campan (JIRA)" <ji...@apache.org> on 2012/10/30 10:56:12 UTC

[jira] [Created] (CASSANDRA-4877) Range queries return fewer result after a lot of delete

julien campan created CASSANDRA-4877:
----------------------------------------

             Summary: Range queries return fewer result after a lot of delete
                 Key: CASSANDRA-4877
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4877
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 1.2.0 beta 1
            Reporter: julien campan


Hi, I'm testing on the trunk version
I'm using : [cqlsh 2.3.0 | Cassandra 1.2.0-beta1-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.35.0]

My use case is :
I create a table
CREATE TABLE premier (
id int PRIMARY KEY,
value int
) WITH
comment='' AND
caching='KEYS_ONLY' AND
read_repair_chance=0.100000 AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
replicate_on_write='true' AND
compression={'sstable_compression': 'SnappyCompressor'};

1) I insert 10 000 000 rows (they are like id = 1 and value =1)
2) I delete 2 000 000 rows (i use random method to choose the key value)
3) I do select * from premier ; and my result is 7944 instead of 10 000.
4) if if do select * from premier limit 20000 ; my result is 15839 .

So after a lot of delete, the range operator is not working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4877) Range queries return fewer result after a lot of delete

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-4877:
----------------------------------------

    Attachment: 0002-Rename-maxIsColumns-to-countCQL3Rows.patch
                0001-4877.patch

Attaching patch to fix this. The problem is that our handling of LIMIT was still not correct, in particular when NamesQueryFilter where used, as delete rows were wrongfully counted.

One problem with that patch is that we may still under-count in a mixed 1.1/1.2 cluster because 1.1 nodes won't know how to count correctly. That's sad, but at the same time changing this in 1.1 would be hard and dangerous and CQL3 is beta in 1.1 after all.

Note that I'm attaching 2 patches. The first one is the bulk of the fix. The second one is mostly a renaming of the 'maxIsColumn' parameters that is used in a number of place to 'countCQL3Rows' because that describe more faithfully what this parameter actually do.
                
> Range queries return fewer result after a lot of delete
> -------------------------------------------------------
>
>                 Key: CASSANDRA-4877
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4877
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.0 beta 1
>            Reporter: julien campan
>            Assignee: Sylvain Lebresne
>             Fix For: 1.2.0 rc1
>
>         Attachments: 0001-4877.patch, 0002-Rename-maxIsColumns-to-countCQL3Rows.patch
>
>
> Hi, I'm testing on the trunk version
> I'm using : [cqlsh 2.3.0 | Cassandra 1.2.0-beta1-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.35.0]
> My use case is :
> I create a table
> CREATE TABLE premier (
> id int PRIMARY KEY,
> value int
> ) WITH
> comment='' AND
> caching='KEYS_ONLY' AND
> read_repair_chance=0.100000 AND
> dclocal_read_repair_chance=0.000000 AND
> gc_grace_seconds=864000 AND
> replicate_on_write='true' AND
> compression={'sstable_compression': 'SnappyCompressor'};
> 1) I insert 10 000 000 rows (they are like id = 1 and value =1)
> 2) I delete 2 000 000 rows (i use random method to choose the key value)
> 3) I do select * from premier ; and my result is 7944 instead of 10 000.
> 4) if if do select * from premier limit 20000 ; my result is 15839 .
> So after a lot of delete, the range operator is not working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-4877) Range queries return fewer result after a lot of delete

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne reassigned CASSANDRA-4877:
-------------------------------------------

    Assignee: Sylvain Lebresne
    
> Range queries return fewer result after a lot of delete
> -------------------------------------------------------
>
>                 Key: CASSANDRA-4877
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4877
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.0 beta 1
>            Reporter: julien campan
>            Assignee: Sylvain Lebresne
>
> Hi, I'm testing on the trunk version
> I'm using : [cqlsh 2.3.0 | Cassandra 1.2.0-beta1-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.35.0]
> My use case is :
> I create a table
> CREATE TABLE premier (
> id int PRIMARY KEY,
> value int
> ) WITH
> comment='' AND
> caching='KEYS_ONLY' AND
> read_repair_chance=0.100000 AND
> dclocal_read_repair_chance=0.000000 AND
> gc_grace_seconds=864000 AND
> replicate_on_write='true' AND
> compression={'sstable_compression': 'SnappyCompressor'};
> 1) I insert 10 000 000 rows (they are like id = 1 and value =1)
> 2) I delete 2 000 000 rows (i use random method to choose the key value)
> 3) I do select * from premier ; and my result is 7944 instead of 10 000.
> 4) if if do select * from premier limit 20000 ; my result is 15839 .
> So after a lot of delete, the range operator is not working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4877) Range queries return fewer result after a lot of delete

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501900#comment-13501900 ] 

Jonathan Ellis commented on CASSANDRA-4877:
-------------------------------------------

+1
                
> Range queries return fewer result after a lot of delete
> -------------------------------------------------------
>
>                 Key: CASSANDRA-4877
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4877
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.0 beta 1
>            Reporter: julien campan
>            Assignee: Sylvain Lebresne
>             Fix For: 1.2.0 rc1
>
>         Attachments: 0001-4877.patch, 0002-Rename-maxIsColumns-to-countCQL3Rows.patch
>
>
> Hi, I'm testing on the trunk version
> I'm using : [cqlsh 2.3.0 | Cassandra 1.2.0-beta1-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.35.0]
> My use case is :
> I create a table
> CREATE TABLE premier (
> id int PRIMARY KEY,
> value int
> ) WITH
> comment='' AND
> caching='KEYS_ONLY' AND
> read_repair_chance=0.100000 AND
> dclocal_read_repair_chance=0.000000 AND
> gc_grace_seconds=864000 AND
> replicate_on_write='true' AND
> compression={'sstable_compression': 'SnappyCompressor'};
> 1) I insert 10 000 000 rows (they are like id = 1 and value =1)
> 2) I delete 2 000 000 rows (i use random method to choose the key value)
> 3) I do select * from premier ; and my result is 7944 instead of 10 000.
> 4) if if do select * from premier limit 20000 ; my result is 15839 .
> So after a lot of delete, the range operator is not working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-4877) Range queries return fewer result after a lot of delete

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-4877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-4877:
----------------------------------------

    Fix Version/s: 1.2.0 rc1
    
> Range queries return fewer result after a lot of delete
> -------------------------------------------------------
>
>                 Key: CASSANDRA-4877
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4877
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.0 beta 1
>            Reporter: julien campan
>            Assignee: Sylvain Lebresne
>             Fix For: 1.2.0 rc1
>
>
> Hi, I'm testing on the trunk version
> I'm using : [cqlsh 2.3.0 | Cassandra 1.2.0-beta1-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 19.35.0]
> My use case is :
> I create a table
> CREATE TABLE premier (
> id int PRIMARY KEY,
> value int
> ) WITH
> comment='' AND
> caching='KEYS_ONLY' AND
> read_repair_chance=0.100000 AND
> dclocal_read_repair_chance=0.000000 AND
> gc_grace_seconds=864000 AND
> replicate_on_write='true' AND
> compression={'sstable_compression': 'SnappyCompressor'};
> 1) I insert 10 000 000 rows (they are like id = 1 and value =1)
> 2) I delete 2 000 000 rows (i use random method to choose the key value)
> 3) I do select * from premier ; and my result is 7944 instead of 10 000.
> 4) if if do select * from premier limit 20000 ; my result is 15839 .
> So after a lot of delete, the range operator is not working.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira