You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by anuja jain <an...@gmail.com> on 2016/01/07 13:45:59 UTC

Sorting & pagination in apache cassandra 2.1

HI All,
 If suppose I have a cassandra table with structure
CREATE TABLE test.t1 (
    col1 text,
    col2 text,
    col3 text,
    col4 text,
    PRIMARY KEY (col1, col2, col3, col4)
) WITH CLUSTERING ORDER BY (col2 ASC, col3 ASC, col4 ASC);

and it has following data

 col1 | col2 | col3 | col4
------+------+------+------
  abc |  abc |  abc |  abc

and I query the table saying
select * from t1 where col1='abc' order by col3;

it gives me following error
InvalidRequest: code=2200 [Invalid query] message="Order by currently only
support the ordering of columns following their declared order in the
PRIMARY KEY"

While reading on docs I came to know that only the first clustering column
can ordered by independently and for the other columns we need to follow
the sequence of the clustering columns.
My question is, what is the alternative if we need to order by col3 or col4
in my above example without including col2 in order by clause.


Thanks,
Anuja

Re: Sorting & pagination in apache cassandra 2.1

Posted by Carlos Alonso <in...@mrcalonso.com>.

Hi Anuja.

Yeah, that's what he means. Before Cassandra 3.0 the modelling advice is to
have one table per query. This may sound weird from a relational
perspective, but the truth is that writes in Cassandra are very cheap, and
its better to write multiple times and have quick and easy reads than write
just once and have expensive reads.

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 15 January 2016 at 05:57, anuja jain <an...@gmail.com> wrote:

> @Jonathan
> what do you mean by "you'll need to maintain your own materialized view
> tables"?
> does it mean we have to create new table for each query?
>
> On Wed, Jan 13, 2016 at 7:40 PM, Narendra Sharma <
> narendra.sharma@gmail.com> wrote:
>
>> In the example you gave the primary key user _ name is the row key. Since
>> the default partition is random you are getting rows in random order.
>>
>> Since each row no clustering column there is no further grouping of data.
>> Or in simple terms each row has one record and is being returned ordered by
>> column name.
>>
>> To see some meaningful ordering there should be some clustering column
>> defined.
>>
>> You can use create additional column families to maintain ordering. Or
>> use external solutions like elasticsearch.
>> On Jan 12, 2016 10:07 PM, "anuja jain" <an...@gmail.com> wrote:
>>
>>> I understand the meaning of SSTable but whats the reason behind sorting
>>> the table on the basis of int columns first..
>>> Is there any data type preference in cassandra?
>>> Also What is the alternative to creating materialised views if my
>>> cassandra version is prior to 3.0 (specifically 2.1) and which is already
>>> in production.?
>>>
>>>
>>> On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli <rc...@eventbrite.com>
>>> wrote:
>>>
>>>> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain <an...@gmail.com>
>>>> wrote:
>>>>
>>>>> 1 more question, what does it mean by "cassandra inherently sorts
>>>>> data"?
>>>>>
>>>>
>>>> SSTable = Sorted Strings Table.
>>>>
>>>> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>>>>
>>>> =Rob
>>>>
>>>
>>>
>

Re: Sorting & pagination in apache cassandra 2.1

Posted by anuja jain <an...@gmail.com>.

@Jonathan
what do you mean by "you'll need to maintain your own materialized view
tables"?
does it mean we have to create new table for each query?

On Wed, Jan 13, 2016 at 7:40 PM, Narendra Sharma <na...@gmail.com>
wrote:

> In the example you gave the primary key user _ name is the row key. Since
> the default partition is random you are getting rows in random order.
>
> Since each row no clustering column there is no further grouping of data.
> Or in simple terms each row has one record and is being returned ordered by
> column name.
>
> To see some meaningful ordering there should be some clustering column
> defined.
>
> You can use create additional column families to maintain ordering. Or use
> external solutions like elasticsearch.
> On Jan 12, 2016 10:07 PM, "anuja jain" <an...@gmail.com> wrote:
>
>> I understand the meaning of SSTable but whats the reason behind sorting
>> the table on the basis of int columns first..
>> Is there any data type preference in cassandra?
>> Also What is the alternative to creating materialised views if my
>> cassandra version is prior to 3.0 (specifically 2.1) and which is already
>> in production.?
>>
>>
>> On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli <rc...@eventbrite.com>
>> wrote:
>>
>>> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain <an...@gmail.com>
>>> wrote:
>>>
>>>> 1 more question, what does it mean by "cassandra inherently sorts data"?
>>>>
>>>
>>> SSTable = Sorted Strings Table.
>>>
>>> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>>>
>>> =Rob
>>>
>>
>>

Re: Sorting & pagination in apache cassandra 2.1

Posted by Narendra Sharma <na...@gmail.com>.

In the example you gave the primary key user _ name is the row key. Since
the default partition is random you are getting rows in random order.

Since each row no clustering column there is no further grouping of data.
Or in simple terms each row has one record and is being returned ordered by
column name.

To see some meaningful ordering there should be some clustering column
defined.

You can use create additional column families to maintain ordering. Or use
external solutions like elasticsearch.
On Jan 12, 2016 10:07 PM, "anuja jain" <an...@gmail.com> wrote:

> I understand the meaning of SSTable but whats the reason behind sorting
> the table on the basis of int columns first..
> Is there any data type preference in cassandra?
> Also What is the alternative to creating materialised views if my
> cassandra version is prior to 3.0 (specifically 2.1) and which is already
> in production.?
>
>
> On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli <rc...@eventbrite.com>
> wrote:
>
>> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain <an...@gmail.com>
>> wrote:
>>
>>> 1 more question, what does it mean by "cassandra inherently sorts data"?
>>>
>>
>> SSTable = Sorted Strings Table.
>>
>> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>>
>> =Rob
>>
>
>

Re: Sorting & pagination in apache cassandra 2.1

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

The clustering keys determine the sorting of rows within a partition.  The
partitions within a file are sorted by their token (usually computed by
applying the murmur 3 hash to the partition key).

If you are using a version of Cassandra < 3.0, you'll need to maintain your
own materialized view tables.

On Tue, Jan 12, 2016 at 10:07 PM anuja jain <an...@gmail.com> wrote:

> I understand the meaning of SSTable but whats the reason behind sorting
> the table on the basis of int columns first..
> Is there any data type preference in cassandra?
> Also What is the alternative to creating materialised views if my
> cassandra version is prior to 3.0 (specifically 2.1) and which is already
> in production.?
>
>
> On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli <rc...@eventbrite.com>
> wrote:
>
>> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain <an...@gmail.com>
>> wrote:
>>
>>> 1 more question, what does it mean by "cassandra inherently sorts data"?
>>>
>>
>> SSTable = Sorted Strings Table.
>>
>> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>>
>> =Rob
>>
>
>

Re: Sorting & pagination in apache cassandra 2.1

Posted by anuja jain <an...@gmail.com>.

I understand the meaning of SSTable but whats the reason behind sorting the
table on the basis of int columns first..
Is there any data type preference in cassandra?
Also What is the alternative to creating materialised views if my cassandra
version is prior to 3.0 (specifically 2.1) and which is already in
production.?

On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain <an...@gmail.com> wrote:
>
>> 1 more question, what does it mean by "cassandra inherently sorts data"?
>>
>
> SSTable = Sorted Strings Table.
>
> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>
> =Rob
>

Re: Sorting & pagination in apache cassandra 2.1

Posted by Robert Coli <rc...@eventbrite.com>.

On Mon, Jan 11, 2016 at 11:30 PM, anuja jain <an...@gmail.com> wrote:

> 1 more question, what does it mean by "cassandra inherently sorts data"?
>

SSTable = Sorted Strings Table.

It doesn't contain "Strings" anymore, really, but that's a hint.. :)

=Rob

Re: Sorting & pagination in apache cassandra 2.1

Posted by Carlos Alonso <in...@mrcalonso.com>.

Hi Anuja.

Cassandra saves records on disk sorted by the clustering column. In this
case you haven't selected any but it looks like is picking birth_year as
the clustering column. I don't know which is the clustering column
selection algorithm though (maybe alphabetically by name?).

Regards

Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso>

On 12 January 2016 at 07:30, anuja jain <an...@gmail.com> wrote:

> 1 more question, what does it mean by "cassandra inherently sorts data"?
> For eg:
> I have a table with schema
>
> CREATE TABLE users (
>
>             ...   user_name varchar PRIMARY KEY,
>
>             ...   password varchar,
>
>             ...   gender varchar,
>
>             ...   session_token varchar,
>
>             ...   state varchar,
>
>             ...   birth_year bigint
>
>             ... );
>
> I inserted data in random order but I on firing select statement I get
> data sorted by birth_year..  So why does this happen?
>
>  cqlsh:learning> select * from users;
>
>
>
> user_name | birth_year | gender | password | session_token | state
>
> -----------+------------+--------+----------+---------------+---------
>
>       John |       1979 |      M |     qwer |           abc |      JK
>
>    Dharini |       1980 |      F |      Xyz |           abc | Gujarat
>
>      Keval |       1990 |      M |      DDD |           abc |      WB
>
> On Tue, Jan 12, 2016 at 12:52 PM, anuja jain <an...@gmail.com> wrote:
>
>> What is the alternative if my cassandra version is prior to 3.0
>> (specifically) 2.1) and which is already in production.?
>>
>> Also as per the docs given at
>>
>>
>> https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/srch/srchCapazty.html
>>  what does it mean by we need to do capacity planning if we need to
>> search using SOLR. What is other alternative when we do not know the size
>> of the data ?
>>
>>  Thanks,
>>
>> Anuja
>>
>>
>>
>> On Fri, Jan 8, 2016 at 12:15 AM, Tyler Hobbs <ty...@datastax.com> wrote:
>>
>>>
>>> On Thu, Jan 7, 2016 at 6:45 AM, anuja jain <an...@gmail.com> wrote:
>>>
>>>> My question is, what is the alternative if we need to order by col3 or
>>>> col4 in my above example without including col2 in order by clause.
>>>>
>>>
>>> The server-side alternative is to create a second table (or a
>>> materialized view, if you're using 3.0+) that uses a different clustering
>>> order.  Cassandra purposefully only supports simple and efficient queries
>>> that can be handled quickly (with a few exceptions), and arbitrary ordering
>>> is not part of that, especially if you consider complications like paging.
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax <http://datastax.com/>
>>>
>>
>>
>

Re: Sorting & pagination in apache cassandra 2.1

Posted by anuja jain <an...@gmail.com>.

1 more question, what does it mean by "cassandra inherently sorts data"?
For eg:
I have a table with schema

CREATE TABLE users (

            ...   user_name varchar PRIMARY KEY,

            ...   password varchar,

            ...   gender varchar,

            ...   session_token varchar,

            ...   state varchar,

            ...   birth_year bigint

            ... );

I inserted data in random order but I on firing select statement I get data
sorted by birth_year..  So why does this happen?

 cqlsh:learning> select * from users;



user_name | birth_year | gender | password | session_token | state

-----------+------------+--------+----------+---------------+---------

      John |       1979 |      M |     qwer |           abc |      JK

   Dharini |       1980 |      F |      Xyz |           abc | Gujarat

     Keval |       1990 |      M |      DDD |           abc |      WB

On Tue, Jan 12, 2016 at 12:52 PM, anuja jain <an...@gmail.com> wrote:

> What is the alternative if my cassandra version is prior to 3.0
> (specifically) 2.1) and which is already in production.?
>
> Also as per the docs given at
>
>
> https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/srch/srchCapazty.html
>  what does it mean by we need to do capacity planning if we need to
> search using SOLR. What is other alternative when we do not know the size
> of the data ?
>
>  Thanks,
>
> Anuja
>
>
>
> On Fri, Jan 8, 2016 at 12:15 AM, Tyler Hobbs <ty...@datastax.com> wrote:
>
>>
>> On Thu, Jan 7, 2016 at 6:45 AM, anuja jain <an...@gmail.com> wrote:
>>
>>> My question is, what is the alternative if we need to order by col3 or
>>> col4 in my above example without including col2 in order by clause.
>>>
>>
>> The server-side alternative is to create a second table (or a
>> materialized view, if you're using 3.0+) that uses a different clustering
>> order.  Cassandra purposefully only supports simple and efficient queries
>> that can be handled quickly (with a few exceptions), and arbitrary ordering
>> is not part of that, especially if you consider complications like paging.
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>
>

Re: Sorting & pagination in apache cassandra 2.1

Posted by anuja jain <an...@gmail.com>.

What is the alternative if my cassandra version is prior to 3.0
(specifically) 2.1) and which is already in production.?

Also as per the docs given at

https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/srch/srchCapazty.html
 what does it mean by we need to do capacity planning if we need to search
using SOLR. What is other alternative when we do not know the size of the
data ?

 Thanks,

Anuja

On Fri, Jan 8, 2016 at 12:15 AM, Tyler Hobbs <ty...@datastax.com> wrote:

>
> On Thu, Jan 7, 2016 at 6:45 AM, anuja jain <an...@gmail.com> wrote:
>
>> My question is, what is the alternative if we need to order by col3 or
>> col4 in my above example without including col2 in order by clause.
>>
>
> The server-side alternative is to create a second table (or a materialized
> view, if you're using 3.0+) that uses a different clustering order.
> Cassandra purposefully only supports simple and efficient queries that can
> be handled quickly (with a few exceptions), and arbitrary ordering is not
> part of that, especially if you consider complications like paging.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: Sorting & pagination in apache cassandra 2.1

Posted by Tyler Hobbs <ty...@datastax.com>.

On Thu, Jan 7, 2016 at 6:45 AM, anuja jain <an...@gmail.com> wrote:

> My question is, what is the alternative if we need to order by col3 or
> col4 in my above example without including col2 in order by clause.
>

The server-side alternative is to create a second table (or a materialized
view, if you're using 3.0+) that uses a different clustering order.
Cassandra purposefully only supports simple and efficient queries that can
be handled quickly (with a few exceptions), and arbitrary ordering is not
part of that, especially if you consider complications like paging.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>