You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Created) (JIRA)" <ji...@apache.org> on 2012/02/29 18:27:58 UTC

[jira] [Created] (CASSANDRA-3982) Explore not returning range ghosts

Explore not returning range ghosts
----------------------------------

                 Key: CASSANDRA-3982
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3982
             Project: Cassandra
          Issue Type: Sub-task
            Reporter: Sylvain Lebresne
             Fix For: 1.2


This ticket proposes to remove range ghosts in CQL3.
The basic argument is that range ghosts confuses users a lot and don't add any value since range ghost don't allow to distinguish between the two following case:
* the row is deleted
* the row is not deleted but don't have data for the provided filter

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-3982) Explore not returning range ghosts

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-3982:
----------------------------------------

    Attachment: 3982.txt

I think that what is currently returned by CQL3 is not consistent with respect to handling "null" values. Let me illustrate 2 problems with examples. Consider the following table:
{noformat}
CREATE TABLE test (
    k int PRIMARY KEY,
    c1 int,
    c2 int,
)
{noformat}
First, consider this holds:
{noformat}
 k | c1 | c2
---+----+----
 0 | 0  | 0
 1 | 1  | 1
 2 | 2  | 2
{noformat}
then {{SELECT k FROM test}} will return:
{noformat}
 k
---
 0
 1
 2
{noformat}
But if we do
{noformat}
DELETE FROM test WHERE k=2;
{noformat}
then this will still return the same result (i.e. 2 will show up). Of course this is just the good ol' range ghost problem, but I want to illustrate that while this was "merely" unintuitive in thrift, this is imho just wrong in CQL. I think that we should define a (CQL) row as existing only if it contains one non-primary-key column with a value. And of course, we shouldn't return value that doesn't exist.

The second problem we have is not due to range ghosts. Consider the same table and say that it now contains:
{noformat}
 k | c1 | c2
---+----+----
 0 | 0  | 0
 1 | 1  |
 2 | 2  | 2
{noformat}
i.e. the second row has no value for c2. If we do
{noformat}
SELECT v2 FROM test
{noformat}
then currently this returns
{noformat}
 v2
----
 0
 null
 2
{noformat}
The null returned here is because RangeSlice returns an empty ColumnFamily when the filter match nothing. However,
{noformat}
SELECT v2 FROM test WHERE k = 1
{noformat}
doesn't return anything because the filter selects only v2 and getSlice returns a null ColumnFamily in that case. It *does not* return a single "null" result in particular, which is incoherent with the result to the previous query.

Anyway, I think there is two possible approaches to unify this:
# take the "SQL" approach and say that a select returns every row that matches the WHERE clause, independently of whether the selected columns exists or not. In that approach, the last request above should include a null.
# define that a row is included in the result set only if it has at least one non-null value in the *selected* columns. I.e. none of the two requests above should include a null.

I actually think that we should pick the "SQL" approach because 1) doing otherwise would be much too unintuive to people coming from SQL and 2) because it's better defined when you select only primary key columns.

I note however that there is a downside to that solution: it means that when selecting one column, internally we should query all the columns for the (CQL) row, not just the selected ones, to know if the row exists. However, a solution to that will be to introduce 'IS NOT NULL' (#3783).

Anway, attaching a patch that 1) consider rows to exists only if they have at least one column matching the where clause and 2) use the "SQL" approach above. Tests have been pushed in the dtests.

                
> Explore not returning range ghosts
> ----------------------------------
>
>                 Key: CASSANDRA-3982
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3982
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: API
>    Affects Versions: 1.1.0
>            Reporter: Sylvain Lebresne
>              Labels: cql3
>             Fix For: 1.1.1
>
>         Attachments: 3982.txt
>
>
> This ticket proposes to remove range ghosts in CQL3.
> The basic argument is that range ghosts confuses users a lot and don't add any value since range ghost don't allow to distinguish between the two following case:
> * the row is deleted
> * the row is not deleted but don't have data for the provided filter

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3982) Explore not returning range ghosts

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273684#comment-13273684 ] 

Jonathan Ellis commented on CASSANDRA-3982:
-------------------------------------------

This makes sense to me, primarily because CQL3 divorces the logical row from the physical row, so "checking all columns" is a reasonable price to pay now where before it was not.

bq. a solution to that will be to introduce 'IS NOT NULL' (#3783)

How would that address this?
                
> Explore not returning range ghosts
> ----------------------------------
>
>                 Key: CASSANDRA-3982
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3982
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: API
>    Affects Versions: 1.1.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: cql3
>             Fix For: 1.1.1
>
>         Attachments: 3982.txt
>
>
> This ticket proposes to remove range ghosts in CQL3.
> The basic argument is that range ghosts confuses users a lot and don't add any value since range ghost don't allow to distinguish between the two following case:
> * the row is deleted
> * the row is not deleted but don't have data for the provided filter

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3982) Explore not returning range ghosts

Posted by "Christoph Tavan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255463#comment-13255463 ] 

Christoph Tavan commented on CASSANDRA-3982:
--------------------------------------------

When working on the helenus driver for node.js I was stumbling upon this problem recently. I was experiencing that for "static" column families I was getting not only row- but also column-ghosts. See this example that uses {{cqlsh -3}} from the 1.1.0-rc1 release:

{code}
$ cqlsh -3
Connected to Test Cluster at localhost:9160.
[cqlsh 2.2.0 | Cassandra 1.1.0~rc1 | CQL spec 3.0.0 | Thrift protocol 19.30.0]
Use HELP for help.
cqlsh> CREATE KEYSPACE helenus_cql3_test WITH strategy_class=SimpleStrategy AND strategy_options:replication_factor=1;
cqlsh> USE helenus_cql3_test ;
cqlsh:helenus_cql3_test> CREATE COLUMNFAMILY cql_test (id text, foo text, PRIMARY KEY (id));
cqlsh:helenus_cql3_test> UPDATE cql_test SET foo='bar' WHERE id='foobar';
cqlsh:helenus_cql3_test> SELECT * FROM cql_test;
 id     | foo
--------+-----
 foobar | bar

cqlsh:helenus_cql3_test> DELETE FROM cql_test WHERE id='foobar';
cqlsh:helenus_cql3_test> SELECT * FROM cql_test;
 id     | foo
--------+------
 foobar | null
{code}

As you can see the result contains not only the primary key (i.e. the row key) as a ghost, but all columns that have been defined in the schema are also contained with a value of null.

I think it would be highly desirable if ghosts would just never pop up in any CQL result.
                
> Explore not returning range ghosts
> ----------------------------------
>
>                 Key: CASSANDRA-3982
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3982
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: API
>            Reporter: Sylvain Lebresne
>             Fix For: 1.2
>
>
> This ticket proposes to remove range ghosts in CQL3.
> The basic argument is that range ghosts confuses users a lot and don't add any value since range ghost don't allow to distinguish between the two following case:
> * the row is deleted
> * the row is not deleted but don't have data for the provided filter

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3982) Explore not returning range ghosts

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13273875#comment-13273875 ] 

Sylvain Lebresne commented on CASSANDRA-3982:
---------------------------------------------

bq. How would that address this?

Let say that it would make it possible to write queries like:
{noformat}
SELECT c12, c24 FROM cf WHERE c12 IS NOT NULL AND c24 IS NOT NULL
{noformat}
and for that (if all selected column are query with 'is not null'), we can use a name filter with only c12 and c24. So by address, rather I meant "offer an option to the (advanced) user to only query some columns internally". 
                
> Explore not returning range ghosts
> ----------------------------------
>
>                 Key: CASSANDRA-3982
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3982
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: API
>    Affects Versions: 1.1.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: cql3
>             Fix For: 1.1.1
>
>         Attachments: 3982.txt
>
>
> This ticket proposes to remove range ghosts in CQL3.
> The basic argument is that range ghosts confuses users a lot and don't add any value since range ghost don't allow to distinguish between the two following case:
> * the row is deleted
> * the row is not deleted but don't have data for the provided filter

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-3982) Explore not returning range ghosts

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13274774#comment-13274774 ] 

Jonathan Ellis commented on CASSANDRA-3982:
-------------------------------------------

Just to verify, this also addresses CASSANDRA-3505 as a byproduct of "at least one column must be present," right?
                
> Explore not returning range ghosts
> ----------------------------------
>
>                 Key: CASSANDRA-3982
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3982
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: API
>    Affects Versions: 1.1.0
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: cql3
>             Fix For: 1.1.1
>
>         Attachments: 3982.txt
>
>
> This ticket proposes to remove range ghosts in CQL3.
> The basic argument is that range ghosts confuses users a lot and don't add any value since range ghost don't allow to distinguish between the two following case:
> * the row is deleted
> * the row is not deleted but don't have data for the provided filter

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira