You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jean-Armel Luce <ja...@gmail.com> on 2012/08/20 08:33:00 UTC

Secondary index and/or row key in the read path ?

Hello,

I am using Cassandra 1.1.1 and CQL3.

Could you tell me what is the best strategy for retrieving a row using a
condition on a row key (operator =) and also filter on a 2nd column?

For example, I create a  table named "testwhere" with a row key on column
"mykey" and 2 other columns "col1" and "col2".

I would like to retrieve the row with the key 'key1' only if col1 = 'abcd'
I send the request  SELECT mykey, col1 from testwhere where mykey = 'key1'
and col1 = 'abcd';
As you can see, the 1st condition in the WHERE clause is based on the row
key.
However the request doesn't work if no secondary index is created on the
column used in the 2nd condition of the WHERE clause. It works only if a
secondary indexed is created on this 2nd column (see below).
Does that mean that the secondary index is used in the read path instead of
the row key, even if there is a condition on the row key in the WHERE
clause ?

Here is an example :

jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3
Connected to Test Cluster at localhost:9160.
[cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
Use HELP for help.
cqlsh> use test1;
cqlsh:test1> CREATE TABLE testwhere (mykey varchar PRIMARY KEY,
         ...  col1 varchar,
         ...  col2 varchar);
cqlsh:test1> INSERT INTO testwhere (mykey, col1, col2) VALUES ('key1',
'abcd', 'efgh');

cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1';
 mykey | col1
-------+------
  key1 | abcd

cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1' and
col1 = 'abcd';
Bad Request: No indexed columns present in by-columns clause with Equal
operator

cqlsh:test1> CREATE INDEX col1_idx ON testwhere (col1);
cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1' and
col1 = 'abcd';
 mykey | col1
-------+------
  key1 | abcd

cqlsh:test1>


My understanding is :
The 1st SELECT is working because there is only the row key in the WHERE
clause
The 2nd SELECT is not working because the row key is in the WHERE clause,
but there is no index on col1
The 3rd SELECT (which is the same as the 2nd SELECT) is working because the
row key is in the WHERE clause, and a secondary index is created on col1


For this use case, what are the recommendations of the Cassandra community ?
- do we need to create a secondary index for each column we want to filter ?
- do we need to post-process (filter) the result of the query in our
application ?
- or is there another solution ?


Thanks.

Jean-Armel

Re: Secondary index and/or row key in the read path ?

Posted by Jean-Armel Luce <ja...@gmail.com>.

Hi Aaron,

Thank you for your answer.

So, I shall do post-processing for selecting a row using a row key *and*
applying a column level filter.

Best Regards,
Jean-Armel

2012/8/21 aaron morton <aa...@thelastpickle.com>

> - do we need to post-process (filter) the result of the query in our
> application ?
>
> Thats the one :)
>
> Right now the code paths don't exist to select a row using a row key *and*
> apply a column level filter. The RPC API does not work that way and I'm not
> sure if this is something that is planned for CQL.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 20/08/2012, at 6:33 PM, Jean-Armel Luce <ja...@gmail.com> wrote:
>
>
> Hello,
>
> I am using Cassandra 1.1.1 and CQL3.
>
> Could you tell me what is the best strategy for retrieving a row using a
> condition on a row key (operator =) and also filter on a 2nd column?
>
> For example, I create a  table named "testwhere" with a row key on column
> "mykey" and 2 other columns "col1" and "col2".
>
> I would like to retrieve the row with the key 'key1' only if col1 =
> 'abcd'
> I send the request  SELECT mykey, col1 from testwhere where mykey = 'key1'
> and col1 = 'abcd';
> As you can see, the 1st condition in the WHERE clause is based on the row
> key.
> However the request doesn't work if no secondary index is created on the
> column used in the 2nd condition of the WHERE clause. It works only if a
> secondary indexed is created on this 2nd column (see below).
> Does that mean that the secondary index is used in the read path instead
> of the row key, even if there is a condition on the row key in the WHERE
> clause ?
>
> Here is an example :
>
> jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3
> Connected to Test Cluster at localhost:9160.
> [cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
> Use HELP for help.
> cqlsh> use test1;
> cqlsh:test1> CREATE TABLE testwhere (mykey varchar PRIMARY KEY,
>          ...  col1 varchar,
>          ...  col2 varchar);
> cqlsh:test1> INSERT INTO testwhere (mykey, col1, col2) VALUES ('key1',
> 'abcd', 'efgh');
>
> cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1';
>  mykey | col1
> -------+------
>   key1 | abcd
>
> cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1' and
> col1 = 'abcd';
> Bad Request: No indexed columns present in by-columns clause with Equal
> operator
>
> cqlsh:test1> CREATE INDEX col1_idx ON testwhere (col1);
> cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1' and
> col1 = 'abcd';
>  mykey | col1
> -------+------
>   key1 | abcd
>
> cqlsh:test1>
>
>
> My understanding is :
> The 1st SELECT is working because there is only the row key in the WHERE
> clause
> The 2nd SELECT is not working because the row key is in the WHERE clause,
> but there is no index on col1
> The 3rd SELECT (which is the same as the 2nd SELECT) is working because
> the row key is in the WHERE clause, and a secondary index is created on col1
>
>
> For this use case, what are the recommendations of the Cassandra community
> ?
> - do we need to create a secondary index for each column we want to filter
> ?
> - do we need to post-process (filter) the result of the query in our
> application ?
> - or is there another solution ?
>
>
> Thanks.
>
> Jean-Armel
>
>
>

Re: Secondary index and/or row key in the read path ?

Posted by aaron morton <aa...@thelastpickle.com>.

> - do we need to post-process (filter) the result of the query in our application ?
Thats the one :)

Right now the code paths don't exist to select a row using a row key *and* apply a column level filter. The RPC API does not work that way and I'm not sure if this is something that is planned for CQL. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 20/08/2012, at 6:33 PM, Jean-Armel Luce <ja...@gmail.com> wrote:

> 
> Hello,
> 
> I am using Cassandra 1.1.1 and CQL3.
> 
> Could you tell me what is the best strategy for retrieving a row using a condition on a row key (operator =) and also filter on a 2nd column?
> 
> For example, I create a  table named "testwhere" with a row key on column "mykey" and 2 other columns "col1" and "col2".
>   
> I would like to retrieve the row with the key 'key1' only if col1 = 'abcd'  
> I send the request  SELECT mykey, col1 from testwhere where mykey = 'key1' and col1 = 'abcd';
> As you can see, the 1st condition in the WHERE clause is based on the row key.
> However the request doesn't work if no secondary index is created on the column used in the 2nd condition of the WHERE clause. It works only if a secondary indexed is created on this 2nd column (see below). 
> Does that mean that the secondary index is used in the read path instead of the row key, even if there is a condition on the row key in the WHERE clause ?
> 
> Here is an example :
> 
> jal@jal-VirtualBox:~/cassandra/apache-cassandra-1.1.1/bin$ ./cqlsh -3
> Connected to Test Cluster at localhost:9160.
> [cqlsh 2.2.0 | Cassandra 1.1.1 | CQL spec 3.0.0 | Thrift protocol 19.32.0]
> Use HELP for help.
> cqlsh> use test1;
> cqlsh:test1> CREATE TABLE testwhere (mykey varchar PRIMARY KEY,
>          ...  col1 varchar,
>          ...  col2 varchar);
> cqlsh:test1> INSERT INTO testwhere (mykey, col1, col2) VALUES ('key1', 'abcd', 'efgh');
> 
> cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1';
>  mykey | col1
> -------+------
>   key1 | abcd
> 
> cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1' and col1 = 'abcd';
> Bad Request: No indexed columns present in by-columns clause with Equal operator
> 
> cqlsh:test1> CREATE INDEX col1_idx ON testwhere (col1);
> cqlsh:test1>  SELECT mykey, col1 from testwhere where mykey = 'key1' and col1 = 'abcd';
>  mykey | col1
> -------+------
>   key1 | abcd
> 
> cqlsh:test1>
> 
> 
> My understanding is :
> The 1st SELECT is working because there is only the row key in the WHERE clause
> The 2nd SELECT is not working because the row key is in the WHERE clause, but there is no index on col1
> The 3rd SELECT (which is the same as the 2nd SELECT) is working because the row key is in the WHERE clause, and a secondary index is created on col1
> 
> 
> For this use case, what are the recommendations of the Cassandra community ?
> - do we need to create a secondary index for each column we want to filter ?
> - do we need to post-process (filter) the result of the query in our application ?
> - or is there another solution ?
> 
> 
> Thanks.
> 
> Jean-Armel