You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Daniel Hölbling-Inzko <da...@bitmovin.com> on 2017/10/03 08:36:34 UTC

Migrating a Limit/Offset Pagination and Sorting to Cassandra

Hi,
I am currently working on migrating a service that so far was MySQL based
to Cassandra.
Everything seems to work fine so far, but a few things in the old services
API Spec is posing some interesting data modeling challenges:

The old service was doing Limit/Offset pagination which is obviously
something Cassandra can't really do. I understand how paginationState works
- but so far I haven't figured out a good way to make Limit/Offset work on
top of paginationState (as I need to be 100% backwards compatible).
The only ways which I could think of to make Limit/Offset work would create
scalability issues down the road.

The old service allowed sorting by any field. If I understood correctly
that would require a table for each sort order right? (In-Mem sorting is
not an option unfortunately)
In doing so, how can I make the Java Datastax mapper save to another table
(I really don't want to be writing a Subclass of the Entity for each Table
to add the @Table annotation.

greetings Daniel

Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

Posted by Daniel Hölbling-Inzko <da...@bitmovin.com>.

I now finished a implementation where I just save the pagination state to a
separate table and retrieve it to get to the next page.

So far it seems to work pretty well. But I have to do more thorough
testing.

Greetings.
On Wed 4. Oct 2017 at 19:42, Jon Haddad <jo...@jonhaddad.com> wrote:

> Seems pretty overengineered, imo, given you can just save the pagination
> state as Andy Tolbert pointed out.
>
> On Oct 4, 2017, at 8:38 AM, Daniel Hölbling-Inzko <
> daniel.hoelbling-inzko@bitmovin.com> wrote:
>
> Thanks for pointing me to Elassandra.
> Have you had any experience running this in production at scale? Not sure
> if I
>
> I think ES will enter the picture at some point since some things just
> don't work efficiently with Cassandra and so it's inevitable in the end.
> But I'd rather delay that step for as long as possible since it would add
> a lot of complexity and another layer of eventual consistency I'd rather
> not deal with at the moment :)
>
> greetings Daniel
>
> On Wed, 4 Oct 2017 at 08:36 Greg Saylor <gr...@net-virtual.com> wrote:
>
>> Without knowing other details, of course, have you considered using
>> something like Elassandra?  That is a pretty tightly integrated Cassandra +
>> Elastic Search solution.   You’d insert data into Cassandra like you do
>> normally, then query it with Elastic Search.  Of course this would increase
>> the size of your storage requirements.
>>
>> - Greg
>>
>>
>> On Oct 3, 2017, at 11:10 PM, Daniel Hölbling-Inzko <
>> daniel.hoelbling-inzko@bitmovin.com> wrote:
>>
>> Thanks Kurt,
>> I thought about that but one issue is that we are doing limit/offset not
>> pages. So one customer can choose to page through the list in 10 Item
>> increments, another might want to page through with 100 elements per page.
>> So I can't have a clustering key that represents a page range.
>>
>> What I was thinking about doing was saving the paginationState in a
>> separate table along with limit/offset info of the last query the
>> paginationState originated from so I can use the last paginationState to
>> continue the iteration from if the customer requests the next page with the
>> same limit but a different offset.
>> This breaks down if the customer does a cold offset=1000 request but
>> that's something I can throw error messages for at, what I do need to
>> support is a customer doing
>> Request 1: offset=0 + limit=100
>> Request 2: offset=100 + limit=100
>> Request 3: offset=200 + limit=100
>>
>> So next question would be: How long is the paginationState from the
>> driver current? I was thinking about inserting the paginationState with a
>> TTL into another Cassandra table - not sure if that's smart though.
>>
>> greetings Daniel
>>
>> On Tue, 3 Oct 2017 at 12:20 kurt greaves <ku...@instaclustr.com> wrote:
>>
>>> I get the impression that you are paging through a single partition in
>>> Cassandra? If so you should probably use bounds on clustering keys to get
>>> your "next page". You could use LIMIT as well here but it's mostly
>>> unnecessary. Probably just use the pagesize that you intend for the API.
>>>
>>> Yes you'll need a table for each sort order, which ties into how you
>>> would use clustering keys for LIMIT/OFFSET. Essentially just do range
>>> slices on the clustering keys for each table to get your "pages".
>>>
>>> Also I'm assuming there's a lot of data per partition if in-mem sorting
>>> isn't an option, if this is true you will want to be wary of creating large
>>> partitions and reading them all at once. Although this depends on your data
>>> model and compaction strategy choices.
>>>
>>> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <
>>> daniel.hoelbling-inzko@bitmovin.com> wrote:
>>>
>>>> Hi,
>>>> I am currently working on migrating a service that so far was MySQL
>>>> based to Cassandra.
>>>> Everything seems to work fine so far, but a few things in the old
>>>> services API Spec is posing some interesting data modeling challenges:
>>>>
>>>> The old service was doing Limit/Offset pagination which is obviously
>>>> something Cassandra can't really do. I understand how paginationState works
>>>> - but so far I haven't figured out a good way to make Limit/Offset work on
>>>> top of paginationState (as I need to be 100% backwards compatible).
>>>> The only ways which I could think of to make Limit/Offset work would
>>>> create scalability issues down the road.
>>>>
>>>> The old service allowed sorting by any field. If I understood correctly
>>>> that would require a table for each sort order right? (In-Mem sorting is
>>>> not an option unfortunately)
>>>> In doing so, how can I make the Java Datastax mapper save to another
>>>> table (I really don't want to be writing a Subclass of the Entity for each
>>>> Table to add the @Table annotation.
>>>>
>>>> greetings Daniel
>>>>
>>>
>>>
>>
>

Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

Posted by Jon Haddad <jo...@jonhaddad.com>.

Seems pretty overengineered, imo, given you can just save the pagination state as Andy Tolbert pointed out.

> On Oct 4, 2017, at 8:38 AM, Daniel Hölbling-Inzko <da...@bitmovin.com> wrote:
> 
> Thanks for pointing me to Elassandra.
> Have you had any experience running this in production at scale? Not sure if I 
> 
> I think ES will enter the picture at some point since some things just don't work efficiently with Cassandra and so it's inevitable in the end.
> But I'd rather delay that step for as long as possible since it would add a lot of complexity and another layer of eventual consistency I'd rather not deal with at the moment :)
> 
> greetings Daniel
> 
> On Wed, 4 Oct 2017 at 08:36 Greg Saylor <gregs@net-virtual.com <ma...@net-virtual.com>> wrote:
> Without knowing other details, of course, have you considered using something like Elassandra?  That is a pretty tightly integrated Cassandra + Elastic Search solution.   You’d insert data into Cassandra like you do normally, then query it with Elastic Search.  Of course this would increase the size of your storage requirements.
> 
> - Greg
> 
> 
>> On Oct 3, 2017, at 11:10 PM, Daniel Hölbling-Inzko <daniel.hoelbling-inzko@bitmovin.com <ma...@bitmovin.com>> wrote:
>> 
>> Thanks Kurt,
>> I thought about that but one issue is that we are doing limit/offset not pages. So one customer can choose to page through the list in 10 Item increments, another might want to page through with 100 elements per page. So I can't have a clustering key that represents a page range.
>> 
>> What I was thinking about doing was saving the paginationState in a separate table along with limit/offset info of the last query the paginationState originated from so I can use the last paginationState to continue the iteration from if the customer requests the next page with the same limit but a different offset.
>> This breaks down if the customer does a cold offset=1000 request but that's something I can throw error messages for at, what I do need to support is a customer doing
>> Request 1: offset=0 + limit=100
>> Request 2: offset=100 + limit=100
>> Request 3: offset=200 + limit=100
>> 
>> So next question would be: How long is the paginationState from the driver current? I was thinking about inserting the paginationState with a TTL into another Cassandra table - not sure if that's smart though.
>> 
>> greetings Daniel
>> 
>> On Tue, 3 Oct 2017 at 12:20 kurt greaves <kurt@instaclustr.com <ma...@instaclustr.com>> wrote:
>> I get the impression that you are paging through a single partition in Cassandra? If so you should probably use bounds on clustering keys to get your "next page". You could use LIMIT as well here but it's mostly unnecessary. Probably just use the pagesize that you intend for the API. 
>> 
>> Yes you'll need a table for each sort order, which ties into how you would use clustering keys for LIMIT/OFFSET. Essentially just do range slices on the clustering keys for each table to get your "pages".
>> 
>> Also I'm assuming there's a lot of data per partition if in-mem sorting isn't an option, if this is true you will want to be wary of creating large partitions and reading them all at once. Although this depends on your data model and compaction strategy choices.
>> 
>> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <daniel.hoelbling-inzko@bitmovin.com <ma...@bitmovin.com>> wrote:
>> Hi,
>> I am currently working on migrating a service that so far was MySQL based to Cassandra.
>> Everything seems to work fine so far, but a few things in the old services API Spec is posing some interesting data modeling challenges:
>> 
>> The old service was doing Limit/Offset pagination which is obviously something Cassandra can't really do. I understand how paginationState works - but so far I haven't figured out a good way to make Limit/Offset work on top of paginationState (as I need to be 100% backwards compatible).
>> The only ways which I could think of to make Limit/Offset work would create scalability issues down the road.
>> 
>> The old service allowed sorting by any field. If I understood correctly that would require a table for each sort order right? (In-Mem sorting is not an option unfortunately)
>> In doing so, how can I make the Java Datastax mapper save to another table (I really don't want to be writing a Subclass of the Entity for each Table to add the @Table annotation.
>> 
>> greetings Daniel
>> 
>

Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

Posted by Greg Saylor <gr...@net-virtual.com>.

Yes we’re using it in production in a 22 node cluster across 4 Amazon regions in a large production system.  We were using DSE but recently migrated to it.  There are a few quirks, (copy_to isn’t supported, for example), but so far been pretty pleased with it.

- Greg

> On Oct 4, 2017, at 8:38 AM, Daniel Hölbling-Inzko <da...@bitmovin.com> wrote:
> 
> Thanks for pointing me to Elassandra.
> Have you had any experience running this in production at scale? Not sure if I 
> 
> I think ES will enter the picture at some point since some things just don't work efficiently with Cassandra and so it's inevitable in the end.
> But I'd rather delay that step for as long as possible since it would add a lot of complexity and another layer of eventual consistency I'd rather not deal with at the moment :)
> 
> greetings Daniel
> 
>> On Wed, 4 Oct 2017 at 08:36 Greg Saylor <gr...@net-virtual.com> wrote:
>> Without knowing other details, of course, have you considered using something like Elassandra?  That is a pretty tightly integrated Cassandra + Elastic Search solution.   You’d insert data into Cassandra like you do normally, then query it with Elastic Search.  Of course this would increase the size of your storage requirements.
>> 
>> - Greg
>> 
>> 
>>> On Oct 3, 2017, at 11:10 PM, Daniel Hölbling-Inzko <da...@bitmovin.com> wrote:
>>> 
>>> Thanks Kurt,
>>> I thought about that but one issue is that we are doing limit/offset not pages. So one customer can choose to page through the list in 10 Item increments, another might want to page through with 100 elements per page. So I can't have a clustering key that represents a page range.
>>> 
>>> What I was thinking about doing was saving the paginationState in a separate table along with limit/offset info of the last query the paginationState originated from so I can use the last paginationState to continue the iteration from if the customer requests the next page with the same limit but a different offset.
>>> This breaks down if the customer does a cold offset=1000 request but that's something I can throw error messages for at, what I do need to support is a customer doing
>>> Request 1: offset=0 + limit=100
>>> Request 2: offset=100 + limit=100
>>> Request 3: offset=200 + limit=100
>>> 
>>> So next question would be: How long is the paginationState from the driver current? I was thinking about inserting the paginationState with a TTL into another Cassandra table - not sure if that's smart though.
>>> 
>>> greetings Daniel
>>> 
>>>> On Tue, 3 Oct 2017 at 12:20 kurt greaves <ku...@instaclustr.com> wrote:
>>>> I get the impression that you are paging through a single partition in Cassandra? If so you should probably use bounds on clustering keys to get your "next page". You could use LIMIT as well here but it's mostly unnecessary. Probably just use the pagesize that you intend for the API. 
>>>> 
>>>> Yes you'll need a table for each sort order, which ties into how you would use clustering keys for LIMIT/OFFSET. Essentially just do range slices on the clustering keys for each table to get your "pages".
>>>> 
>>>> Also I'm assuming there's a lot of data per partition if in-mem sorting isn't an option, if this is true you will want to be wary of creating large partitions and reading them all at once. Although this depends on your data model and compaction strategy choices.
>>>> 
>>>>> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <da...@bitmovin.com> wrote:
>>>>> Hi,
>>>>> I am currently working on migrating a service that so far was MySQL based to Cassandra.
>>>>> Everything seems to work fine so far, but a few things in the old services API Spec is posing some interesting data modeling challenges:
>>>>> 
>>>>> The old service was doing Limit/Offset pagination which is obviously something Cassandra can't really do. I understand how paginationState works - but so far I haven't figured out a good way to make Limit/Offset work on top of paginationState (as I need to be 100% backwards compatible).
>>>>> The only ways which I could think of to make Limit/Offset work would create scalability issues down the road.
>>>>> 
>>>>> The old service allowed sorting by any field. If I understood correctly that would require a table for each sort order right? (In-Mem sorting is not an option unfortunately)
>>>>> In doing so, how can I make the Java Datastax mapper save to another table (I really don't want to be writing a Subclass of the Entity for each Table to add the @Table annotation.
>>>>> 
>>>>> greetings Daniel
>>>> 
>>

Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

Posted by Daniel Hölbling-Inzko <da...@bitmovin.com>.

Thanks for pointing me to Elassandra.
Have you had any experience running this in production at scale? Not sure
if I

I think ES will enter the picture at some point since some things just
don't work efficiently with Cassandra and so it's inevitable in the end.
But I'd rather delay that step for as long as possible since it would add a
lot of complexity and another layer of eventual consistency I'd rather not
deal with at the moment :)

greetings Daniel

On Wed, 4 Oct 2017 at 08:36 Greg Saylor <gr...@net-virtual.com> wrote:

> Without knowing other details, of course, have you considered using
> something like Elassandra?  That is a pretty tightly integrated Cassandra +
> Elastic Search solution.   You’d insert data into Cassandra like you do
> normally, then query it with Elastic Search.  Of course this would increase
> the size of your storage requirements.
>
> - Greg
>
>
> On Oct 3, 2017, at 11:10 PM, Daniel Hölbling-Inzko <
> daniel.hoelbling-inzko@bitmovin.com> wrote:
>
> Thanks Kurt,
> I thought about that but one issue is that we are doing limit/offset not
> pages. So one customer can choose to page through the list in 10 Item
> increments, another might want to page through with 100 elements per page.
> So I can't have a clustering key that represents a page range.
>
> What I was thinking about doing was saving the paginationState in a
> separate table along with limit/offset info of the last query the
> paginationState originated from so I can use the last paginationState to
> continue the iteration from if the customer requests the next page with the
> same limit but a different offset.
> This breaks down if the customer does a cold offset=1000 request but
> that's something I can throw error messages for at, what I do need to
> support is a customer doing
> Request 1: offset=0 + limit=100
> Request 2: offset=100 + limit=100
> Request 3: offset=200 + limit=100
>
> So next question would be: How long is the paginationState from the driver
> current? I was thinking about inserting the paginationState with a TTL into
> another Cassandra table - not sure if that's smart though.
>
> greetings Daniel
>
> On Tue, 3 Oct 2017 at 12:20 kurt greaves <ku...@instaclustr.com> wrote:
>
>> I get the impression that you are paging through a single partition in
>> Cassandra? If so you should probably use bounds on clustering keys to get
>> your "next page". You could use LIMIT as well here but it's mostly
>> unnecessary. Probably just use the pagesize that you intend for the API.
>>
>> Yes you'll need a table for each sort order, which ties into how you
>> would use clustering keys for LIMIT/OFFSET. Essentially just do range
>> slices on the clustering keys for each table to get your "pages".
>>
>> Also I'm assuming there's a lot of data per partition if in-mem sorting
>> isn't an option, if this is true you will want to be wary of creating large
>> partitions and reading them all at once. Although this depends on your data
>> model and compaction strategy choices.
>>
>> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <
>> daniel.hoelbling-inzko@bitmovin.com> wrote:
>>
>>> Hi,
>>> I am currently working on migrating a service that so far was MySQL
>>> based to Cassandra.
>>> Everything seems to work fine so far, but a few things in the old
>>> services API Spec is posing some interesting data modeling challenges:
>>>
>>> The old service was doing Limit/Offset pagination which is obviously
>>> something Cassandra can't really do. I understand how paginationState works
>>> - but so far I haven't figured out a good way to make Limit/Offset work on
>>> top of paginationState (as I need to be 100% backwards compatible).
>>> The only ways which I could think of to make Limit/Offset work would
>>> create scalability issues down the road.
>>>
>>> The old service allowed sorting by any field. If I understood correctly
>>> that would require a table for each sort order right? (In-Mem sorting is
>>> not an option unfortunately)
>>> In doing so, how can I make the Java Datastax mapper save to another
>>> table (I really don't want to be writing a Subclass of the Entity for each
>>> Table to add the @Table annotation.
>>>
>>> greetings Daniel
>>>
>>
>>
>

Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

Posted by Andy Tolbert <an...@datastax.com>.

Hi Daniel,

To answer this question:

> How long is the paginationState from the driver current?

The paging state itself contains information about the position in data
where to proceed from, so you don't need to worry about it becoming
stale/invalid.  The only exception is if you upgrade your cluster and start
using a newer protocol version, at which point the paging state will likely
become invalid.  The java driver guide has an explanation of saving and
reusing the paging state
<http://docs.datastax.com/en/developer/java-driver/3.3/manual/paging/#saving-and-reusing-the-paging-state>
that explains this.

Thanks,
Andy

On Wed, Oct 4, 2017 at 1:36 AM Greg Saylor <gr...@net-virtual.com> wrote:

> Without knowing other details, of course, have you considered using
> something like Elassandra?  That is a pretty tightly integrated Cassandra +
> Elastic Search solution.   You’d insert data into Cassandra like you do
> normally, then query it with Elastic Search.  Of course this would increase
> the size of your storage requirements.
>
> - Greg
>
>
> On Oct 3, 2017, at 11:10 PM, Daniel Hölbling-Inzko <
> daniel.hoelbling-inzko@bitmovin.com> wrote:
>
> Thanks Kurt,
> I thought about that but one issue is that we are doing limit/offset not
> pages. So one customer can choose to page through the list in 10 Item
> increments, another might want to page through with 100 elements per page.
> So I can't have a clustering key that represents a page range.
>
> What I was thinking about doing was saving the paginationState in a
> separate table along with limit/offset info of the last query the
> paginationState originated from so I can use the last paginationState to
> continue the iteration from if the customer requests the next page with the
> same limit but a different offset.
> This breaks down if the customer does a cold offset=1000 request but
> that's something I can throw error messages for at, what I do need to
> support is a customer doing
> Request 1: offset=0 + limit=100
> Request 2: offset=100 + limit=100
> Request 3: offset=200 + limit=100
>
> So next question would be: How long is the paginationState from the driver
> current? I was thinking about inserting the paginationState with a TTL into
> another Cassandra table - not sure if that's smart though.
>
> greetings Daniel
>
> On Tue, 3 Oct 2017 at 12:20 kurt greaves <ku...@instaclustr.com> wrote:
>
>> I get the impression that you are paging through a single partition in
>> Cassandra? If so you should probably use bounds on clustering keys to get
>> your "next page". You could use LIMIT as well here but it's mostly
>> unnecessary. Probably just use the pagesize that you intend for the API.
>>
>> Yes you'll need a table for each sort order, which ties into how you
>> would use clustering keys for LIMIT/OFFSET. Essentially just do range
>> slices on the clustering keys for each table to get your "pages".
>>
>> Also I'm assuming there's a lot of data per partition if in-mem sorting
>> isn't an option, if this is true you will want to be wary of creating large
>> partitions and reading them all at once. Although this depends on your data
>> model and compaction strategy choices.
>>
>> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <
>> daniel.hoelbling-inzko@bitmovin.com> wrote:
>>
>>> Hi,
>>> I am currently working on migrating a service that so far was MySQL
>>> based to Cassandra.
>>> Everything seems to work fine so far, but a few things in the old
>>> services API Spec is posing some interesting data modeling challenges:
>>>
>>> The old service was doing Limit/Offset pagination which is obviously
>>> something Cassandra can't really do. I understand how paginationState works
>>> - but so far I haven't figured out a good way to make Limit/Offset work on
>>> top of paginationState (as I need to be 100% backwards compatible).
>>> The only ways which I could think of to make Limit/Offset work would
>>> create scalability issues down the road.
>>>
>>> The old service allowed sorting by any field. If I understood correctly
>>> that would require a table for each sort order right? (In-Mem sorting is
>>> not an option unfortunately)
>>> In doing so, how can I make the Java Datastax mapper save to another
>>> table (I really don't want to be writing a Subclass of the Entity for each
>>> Table to add the @Table annotation.
>>>
>>> greetings Daniel
>>>
>>
>>
>

Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

Posted by Greg Saylor <gr...@net-virtual.com>.

Without knowing other details, of course, have you considered using something like Elassandra?  That is a pretty tightly integrated Cassandra + Elastic Search solution.   You’d insert data into Cassandra like you do normally, then query it with Elastic Search.  Of course this would increase the size of your storage requirements.

- Greg


> On Oct 3, 2017, at 11:10 PM, Daniel Hölbling-Inzko <da...@bitmovin.com> wrote:
> 
> Thanks Kurt,
> I thought about that but one issue is that we are doing limit/offset not pages. So one customer can choose to page through the list in 10 Item increments, another might want to page through with 100 elements per page. So I can't have a clustering key that represents a page range.
> 
> What I was thinking about doing was saving the paginationState in a separate table along with limit/offset info of the last query the paginationState originated from so I can use the last paginationState to continue the iteration from if the customer requests the next page with the same limit but a different offset.
> This breaks down if the customer does a cold offset=1000 request but that's something I can throw error messages for at, what I do need to support is a customer doing
> Request 1: offset=0 + limit=100
> Request 2: offset=100 + limit=100
> Request 3: offset=200 + limit=100
> 
> So next question would be: How long is the paginationState from the driver current? I was thinking about inserting the paginationState with a TTL into another Cassandra table - not sure if that's smart though.
> 
> greetings Daniel
> 
> On Tue, 3 Oct 2017 at 12:20 kurt greaves <kurt@instaclustr.com <ma...@instaclustr.com>> wrote:
> I get the impression that you are paging through a single partition in Cassandra? If so you should probably use bounds on clustering keys to get your "next page". You could use LIMIT as well here but it's mostly unnecessary. Probably just use the pagesize that you intend for the API. 
> 
> Yes you'll need a table for each sort order, which ties into how you would use clustering keys for LIMIT/OFFSET. Essentially just do range slices on the clustering keys for each table to get your "pages".
> 
> Also I'm assuming there's a lot of data per partition if in-mem sorting isn't an option, if this is true you will want to be wary of creating large partitions and reading them all at once. Although this depends on your data model and compaction strategy choices.
> 
> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <daniel.hoelbling-inzko@bitmovin.com <ma...@bitmovin.com>> wrote:
> Hi,
> I am currently working on migrating a service that so far was MySQL based to Cassandra.
> Everything seems to work fine so far, but a few things in the old services API Spec is posing some interesting data modeling challenges:
> 
> The old service was doing Limit/Offset pagination which is obviously something Cassandra can't really do. I understand how paginationState works - but so far I haven't figured out a good way to make Limit/Offset work on top of paginationState (as I need to be 100% backwards compatible).
> The only ways which I could think of to make Limit/Offset work would create scalability issues down the road.
> 
> The old service allowed sorting by any field. If I understood correctly that would require a table for each sort order right? (In-Mem sorting is not an option unfortunately)
> In doing so, how can I make the Java Datastax mapper save to another table (I really don't want to be writing a Subclass of the Entity for each Table to add the @Table annotation.
> 
> greetings Daniel
>

Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

Posted by Daniel Hölbling-Inzko <da...@bitmovin.com>.

Thanks Kurt,
I thought about that but one issue is that we are doing limit/offset not
pages. So one customer can choose to page through the list in 10 Item
increments, another might want to page through with 100 elements per page.
So I can't have a clustering key that represents a page range.

What I was thinking about doing was saving the paginationState in a
separate table along with limit/offset info of the last query the
paginationState originated from so I can use the last paginationState to
continue the iteration from if the customer requests the next page with the
same limit but a different offset.
This breaks down if the customer does a cold offset=1000 request but that's
something I can throw error messages for at, what I do need to support is a
customer doing
Request 1: offset=0 + limit=100
Request 2: offset=100 + limit=100
Request 3: offset=200 + limit=100

So next question would be: How long is the paginationState from the driver
current? I was thinking about inserting the paginationState with a TTL into
another Cassandra table - not sure if that's smart though.

greetings Daniel

On Tue, 3 Oct 2017 at 12:20 kurt greaves <ku...@instaclustr.com> wrote:

> I get the impression that you are paging through a single partition in
> Cassandra? If so you should probably use bounds on clustering keys to get
> your "next page". You could use LIMIT as well here but it's mostly
> unnecessary. Probably just use the pagesize that you intend for the API.
>
> Yes you'll need a table for each sort order, which ties into how you would
> use clustering keys for LIMIT/OFFSET. Essentially just do range slices on
> the clustering keys for each table to get your "pages".
>
> Also I'm assuming there's a lot of data per partition if in-mem sorting
> isn't an option, if this is true you will want to be wary of creating large
> partitions and reading them all at once. Although this depends on your data
> model and compaction strategy choices.
>
> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <
> daniel.hoelbling-inzko@bitmovin.com> wrote:
>
>> Hi,
>> I am currently working on migrating a service that so far was MySQL based
>> to Cassandra.
>> Everything seems to work fine so far, but a few things in the old
>> services API Spec is posing some interesting data modeling challenges:
>>
>> The old service was doing Limit/Offset pagination which is obviously
>> something Cassandra can't really do. I understand how paginationState works
>> - but so far I haven't figured out a good way to make Limit/Offset work on
>> top of paginationState (as I need to be 100% backwards compatible).
>> The only ways which I could think of to make Limit/Offset work would
>> create scalability issues down the road.
>>
>> The old service allowed sorting by any field. If I understood correctly
>> that would require a table for each sort order right? (In-Mem sorting is
>> not an option unfortunately)
>> In doing so, how can I make the Java Datastax mapper save to another
>> table (I really don't want to be writing a Subclass of the Entity for each
>> Table to add the @Table annotation.
>>
>> greetings Daniel
>>
>
>

Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

Posted by kurt greaves <ku...@instaclustr.com>.

I get the impression that you are paging through a single partition in
Cassandra? If so you should probably use bounds on clustering keys to get
your "next page". You could use LIMIT as well here but it's mostly
unnecessary. Probably just use the pagesize that you intend for the API.

Yes you'll need a table for each sort order, which ties into how you would
use clustering keys for LIMIT/OFFSET. Essentially just do range slices on
the clustering keys for each table to get your "pages".

Also I'm assuming there's a lot of data per partition if in-mem sorting
isn't an option, if this is true you will want to be wary of creating large
partitions and reading them all at once. Although this depends on your data
model and compaction strategy choices.

On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <
daniel.hoelbling-inzko@bitmovin.com> wrote:

> Hi,
> I am currently working on migrating a service that so far was MySQL based
> to Cassandra.
> Everything seems to work fine so far, but a few things in the old services
> API Spec is posing some interesting data modeling challenges:
>
> The old service was doing Limit/Offset pagination which is obviously
> something Cassandra can't really do. I understand how paginationState works
> - but so far I haven't figured out a good way to make Limit/Offset work on
> top of paginationState (as I need to be 100% backwards compatible).
> The only ways which I could think of to make Limit/Offset work would
> create scalability issues down the road.
>
> The old service allowed sorting by any field. If I understood correctly
> that would require a table for each sort order right? (In-Mem sorting is
> not an option unfortunately)
> In doing so, how can I make the Java Datastax mapper save to another table
> (I really don't want to be writing a Subclass of the Entity for each Table
> to add the @Table annotation.
>
> greetings Daniel
>