You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jimmy Lin <y2...@gmail.com> on 2015/03/06 23:56:17 UTC

timeout when using secondary index

Hi,
Ran into RPC timeout exception when execution a query that involve
secondary index of a Boolean column when for example the company has more
than 1k person.

select * from company where company_id=xxxx and isMale = true;

such extreme low cardinality of secondary index  like the other docs
stated, will result in basically 2 large row those values. However, I
thought since I also bounded the query with my primary partition key, won't
that be first consulted and then further narrow down the result and be
efficient?

Also, if I simply do
select * from company where company_id=xxxx ;
(without the AND clause on secondary index, it return right away)


Or mayb Cassandra server internal always parsing the secondary index result
first?

thanks



I have a simple table

create table company {
company_id uuid,
person_id uuid,
isMale Boolean,
PRIMARY KEY (company_id, person_id)
)

Re: timeout when using secondary index

Posted by Patrick McFadin <pm...@gmail.com>.
Jimmy,

The secondary index is getting scanned since you put the column in your
query. The behavior you are looking for is a coming feature called Global
Indexes slated for 3.0. https://issues.apache.org/jira/browse/CASSANDRA-6477

In the meantime, you could build your own lookup table even with this low
of cardinality. If the point is to find everyone of a certain gender in a
company, give this a try.

create table company_gender (
   company_id uuid,
   gender text,
   person_id uuid,
   PRIMARY KEY (company_id, gender)
)

Each company would be a partition and you could find all males or females
with a single query. The bonus is that you would get paging which will be
much more efficient.

Patrick




On Fri, Mar 6, 2015 at 2:56 PM, Jimmy Lin <y2...@gmail.com> wrote:

> Hi,
> Ran into RPC timeout exception when execution a query that involve
> secondary index of a Boolean column when for example the company has more
> than 1k person.
>
> select * from company where company_id=xxxx and isMale = true;
>
> such extreme low cardinality of secondary index  like the other docs
> stated, will result in basically 2 large row those values. However, I
> thought since I also bounded the query with my primary partition key, won't
> that be first consulted and then further narrow down the result and be
> efficient?
>
> Also, if I simply do
> select * from company where company_id=xxxx ;
> (without the AND clause on secondary index, it return right away)
>
>
> Or mayb Cassandra server internal always parsing the secondary index
> result first?
>
> thanks
>
>
>
> I have a simple table
>
> create table company {
> company_id uuid,
> person_id uuid,
> isMale Boolean,
> PRIMARY KEY (company_id, person_id)
> )
>
>
>
>
>