You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Kumar Ranjan <wi...@gmail.com> on 2013/12/12 16:56:47 UTC

Cassandra pytho pagination

Hey Folks,

I need some ideas about support implementing of pagination on the browser,
from the backend. So python code (backend) gets request from frontend with
page=1,2,3,4 and so on and count_per_page=50.

I am trying to use xget with column_count and buffer_size parameters. Can
someone explain me, how does it work? From doc, my understanding is that, I
can do something like,


total_cols is total columns for that key.
count is what user sends me.

.*xget*('Twitter_search', hh, column_count=total_cols, buffer_size=count):

Is my understanding correct? because its not working for page 2 and so on?
Please enlighten me with suggestions.

Thanks.

Re: Cassandra pytho pagination

Posted by Aaron Morton <aa...@thelastpickle.com>.

> First approach:

Sounds good. 

> Second approach ( I used in production ):
If the row gets big enough this will have bad performance. 

A

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 19/12/2013, at 10:28 am, Kumar Ranjan <wi...@gmail.com> wrote:

> I am using pycassa. So, here is how I solved this issue. Will discuss 2 approaches. First approach didn't work out for me. Thanks Aaron for your attention.
> 
> First approach:
> - Say if column_count = 10
> - collect first 11 rows, sort first 10, send it to user (front end) as JSON object and last=11th_column
> - User then calls for page 2, with prev = 1st_column_id, column_start = 11th_column and column_count = 10
> - This way, I can traverse, next page and previous page.
> - Only issue with this approach is, I don't have all columns in super column sorted. So this did not work.
> 
> Second approach ( I used in production ):
> - fetch all super columns for a row key
> - Sort this in python using sorted and lambda function based on column values.
> - Once sorted, I prepare buckets and each bucked size is of page size/column count. Also filter out any rogue data if needed
> - Store page by page results in Redis with keys such as 'row_key|page_1|super_column' and keep refreshing redis periodically.
> 
> I am sure, there must be a better and brighter approach but for now, 2nd approach is working. Thoughts ??
> 
> 
> 
> On Tue, Dec 17, 2013 at 9:19 PM, Aaron Morton <aa...@thelastpickle.com> wrote:
> CQL3 and thrift do not support an offset clause, so you can only really support next / prev page calls to the database. 
> 
>> I am trying to use xget with column_count and buffer_size parameters. Can someone explain me, how does it work? From doc, my understanding is that, I can do something like,
> What client are you using ? 
> xget is not a standard cassandra function. 
> 
> Cheers
> 
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
> 
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> On 13/12/2013, at 4:56 am, Kumar Ranjan <wi...@gmail.com> wrote:
> 
>> Hey Folks,
>> 
>> I need some ideas about support implementing of pagination on the browser, from the backend. So python code (backend) gets request from frontend with page=1,2,3,4 and so on and count_per_page=50. 
>> 
>> I am trying to use xget with column_count and buffer_size parameters. Can someone explain me, how does it work? From doc, my understanding is that, I can do something like,
>> 
>> 
>> total_cols is total columns for that key.
>> count is what user sends me. 
>> .xget('Twitter_search', hh, column_count=total_cols, buffer_size=count):
>> 
>> Is my understanding correct? because its not working for page 2 and so on? Please enlighten me with suggestions.
>> 
>> Thanks.
>> 
> 
>

Re: Cassandra pytho pagination

Posted by Aaron Morton <aa...@thelastpickle.com>.

> Is there something wrong with it? Here 1234555665_53323232 and 2344555665_53323232 are super columns. Also, If I have to represent this data with new composite comparator, How will I accomplish that?
> 
> 

Composite types via pycassa http://pycassa.github.io/pycassa/assorted/composite_types.html?highlight=composite

Create a composite of where the super column value is the first part and the second part is the column name, this is basically what cql3 does. 

You will have to make all columns the same type though.

Or use CQL 3, it works well for these sorts of models. 

Cheers

	
-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 20/12/2013, at 7:22 am, Kumar Ranjan <wi...@gmail.com> wrote:

> Rob - I got a question following your advice. This is how, I define my column family 
> validators = {
> 
>     'approved':            'UTF8Type',
> 
>     'tid':                 'UTF8Type',
> 
>     'iid':                 'UTF8Type',
> 
>     'score':               'IntegerType',
> 
>     'likes':               'IntegerType',
> 
>     'retweet':             'IntegerType',
> 
>     'favorite':            'IntegerType',
> 
>     'screen_name':         'UTF8Type',
> 
>     'created_date':        'UTF8Type',
> 
>     'expanded_url':        'UTF8Type',
> 
>     'embedly_data':        'BytesType',
> 
> }
> 
> SYSTEM_MANAGER.create_column_family('KeySpaceNNN', 'Twitter_Instagram', default_validation_class='UTF8Type', super=True, comparator='UTF8Type', key_validation_class='UTF8Type', column_validation_classes=validator)
> 
> Actual data representation:
> 
> 'row_key': {'1234555665_53323232': {'approved': 'false', 'tid': 123,  'iid': 346666, 'score': 2, likes: 50, retweets: 45, favorite: 34, screen_name:'goodname'},
> 
>                 '2344555665_53323232': {'approved': 'false', 'tid': 134,  'iid': 346666, 'score': 2, likes: 50, retweets: 45, favorite: 34, screen_name:'newname'}.
> 
>                 .....
> 
>                }
> 
> Is there something wrong with it? Here 1234555665_53323232 and 2344555665_53323232 are super columns. Also, If I have to represent this data with new composite comparator, How will I accomplish that?
> 
> 
> 
> Please let me know.
> 
> 
> 
> Regards.
> 
> 
> 
> On Wed, Dec 18, 2013 at 5:32 PM, Robert Coli <rc...@eventbrite.com> wrote:
> On Wed, Dec 18, 2013 at 1:28 PM, Kumar Ranjan <wi...@gmail.com> wrote:
> Second approach ( I used in production ):
> - fetch all super columns for a row key
> 
> Stock response mentioning that super columns are anti-advised for use, especially in brand new code.
> 
> =Rob
>  
>

Re: Cassandra pytho pagination

Posted by Kumar Ranjan <wi...@gmail.com>.

Rob - I got a question following your advice. This is how, I define my
column family

validators = {

    'approved':            'UTF8Type',

    'tid':                 'UTF8Type',

    'iid':                 'UTF8Type',

    'score':               'IntegerType',

    'likes':               'IntegerType',

    'retweet':             'IntegerType',

    'favorite':            'IntegerType',

    'screen_name':         'UTF8Type',

    'created_date':        'UTF8Type',

    'expanded_url':        'UTF8Type',

    'embedly_data':        'BytesType',

}

SYSTEM_MANAGER.create_column_family('KeySpaceNNN', 'Twitter_Instagram',
default_validation_class='UTF8Type', super=True, comparator='UTF8Type',
key_validation_class='UTF8Type', column_validation_classes=validator)

Actual data representation:

'row_key': {'1234555665_53323232': {'approved': 'false', 'tid':
123,  'iid': 346666, 'score': 2, likes: 50, retweets: 45, favorite: 34,
screen_name:'goodname'},

                '2344555665_53323232': {'approved': 'false', 'tid':
134,  'iid': 346666, 'score': 2, likes: 50, retweets: 45, favorite: 34,
screen_name:'newname'}.

                .....

               }

Is there something wrong with it? Here 1234555665_53323232 and
2344555665_53323232 are super columns. Also, If I have to represent this
data with new composite comparator, How will I accomplish that?


Please let me know.


Regards.


On Wed, Dec 18, 2013 at 5:32 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Dec 18, 2013 at 1:28 PM, Kumar Ranjan <wi...@gmail.com>wrote:
>
>> Second approach ( I used in production ):
>> - fetch all super columns for a row key
>>
>
> Stock response mentioning that super columns are anti-advised for use,
> especially in brand new code.
>
> =Rob
>
>

Re: Cassandra pytho pagination

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Dec 18, 2013 at 1:28 PM, Kumar Ranjan <wi...@gmail.com> wrote:

> Second approach ( I used in production ):
> - fetch all super columns for a row key
>

Stock response mentioning that super columns are anti-advised for use,
especially in brand new code.

=Rob

Re: Cassandra pytho pagination

Posted by Kumar Ranjan <wi...@gmail.com>.

I am using pycassa. So, here is how I solved this issue. Will discuss 2
approaches. First approach didn't work out for me. Thanks Aaron for your
attention.

First approach:
- Say if column_count = 10
- collect first 11 rows, sort first 10, send it to user (front end) as JSON
object and last=11th_column
- User then calls for page 2, with prev = 1st_column_id, column_start =
11th_column and column_count = 10
- This way, I can traverse, next page and previous page.
- Only issue with this approach is, I don't have all columns in super
column sorted. So this did not work.

Second approach ( I used in production ):
- fetch all super columns for a row key
- Sort this in python using sorted and lambda function based on column
values.
- Once sorted, I prepare buckets and each bucked size is of page
size/column count. Also filter out any rogue data if needed
- Store page by page results in Redis with keys such as
'row_key|page_1|super_column' and keep refreshing redis periodically.

I am sure, there must be a better and brighter approach but for now, 2nd
approach is working. Thoughts ??



On Tue, Dec 17, 2013 at 9:19 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

> CQL3 and thrift do not support an offset clause, so you can only really
> support next / prev page calls to the database.
>
> I am trying to use xget with column_count and buffer_size parameters. Can
> someone explain me, how does it work? From doc, my understanding is that, I
> can do something like,
>
> What client are you using ?
> xget is not a standard cassandra function.
>
> Cheers
>
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 13/12/2013, at 4:56 am, Kumar Ranjan <wi...@gmail.com> wrote:
>
> Hey Folks,
>
> I need some ideas about support implementing of pagination on the browser,
> from the backend. So python code (backend) gets request from frontend with
> page=1,2,3,4 and so on and count_per_page=50.
>
> I am trying to use xget with column_count and buffer_size parameters. Can
> someone explain me, how does it work? From doc, my understanding is that, I
> can do something like,
>
>
> total_cols is total columns for that key.
> count is what user sends me.
>
> .*xget*('Twitter_search', hh, column_count=total_cols, buffer_size=count):
>
> Is my understanding correct? because its not working for page 2 and so on?
> Please enlighten me with suggestions.
>
> Thanks.
>
>
>

Re: Cassandra pytho pagination

Posted by Aaron Morton <aa...@thelastpickle.com>.

CQL3 and thrift do not support an offset clause, so you can only really support next / prev page calls to the database. 

> I am trying to use xget with column_count and buffer_size parameters. Can someone explain me, how does it work? From doc, my understanding is that, I can do something like,
What client are you using ? 
xget is not a standard cassandra function. 

Cheers

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 13/12/2013, at 4:56 am, Kumar Ranjan <wi...@gmail.com> wrote:

> Hey Folks,
> 
> I need some ideas about support implementing of pagination on the browser, from the backend. So python code (backend) gets request from frontend with page=1,2,3,4 and so on and count_per_page=50. 
> 
> I am trying to use xget with column_count and buffer_size parameters. Can someone explain me, how does it work? From doc, my understanding is that, I can do something like,
> 
> 
> total_cols is total columns for that key.
> count is what user sends me. 
> .xget('Twitter_search', hh, column_count=total_cols, buffer_size=count):
> 
> Is my understanding correct? because its not working for page 2 and so on? Please enlighten me with suggestions.
> 
> Thanks.
>