You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jim Ancona <ji...@anconafamily.com> on 2016/01/04 16:13:25 UTC

Data Modeling: Partition Size and Query Efficiency

A problem that I have run into repeatedly when doing schema design is how
to control partition size while still allowing for efficient multi-row
queries.

We want to limit partition size to some number between 10 and 100 megabytes
to avoid operational issues. The standard way to do that is to figure out
the maximum number of rows that your "natural partition key" will ever need
to support and then add an additional artificial partition key that
segments the rows sufficiently to get keep the partition size under the
maximum. In the case of time series data, this is often done by bucketing
by time period, i.e. creating a new partition every minute, hour or day.
For non-time series data by doing something like Hash(clustering-key) mod
desired-number-of-partitions.

In my case, multi-row queries to support a REST API typically return a page
of results, where the page size might be anywhere from a few dozen up to
thousands. For query efficiency I want the average number of rows per
partition to be large enough that a query can be satisfied by reading a
small number of partitions--ideally one.

So I want to simultaneously limit the maximum number of rows per partition
and yet maintain a large enough average number of rows per partition to
make my queries efficient. But with my data the ratio between maximum and
average can be very large (up to four orders of magnitude).

Here is an example:


Rows per Partition

Partition Size

Mode

1

1 KB

Median

500

500 KB

90th percentile

5,000

5 MB

99th percentile

50,000

50 MB

Maximum

2,500,000

2.5 GB

In this case, 99% of my data could fit in a single 50 MB partition. But if
I use the standard approach, I have to split my partitions into 50 pieces
to accommodate the largest data. That means that to query the 700 rows for
my median case, I have to read 50 partitions instead of one.

If you try to deal with this by starting a new partition when an old one
fills up, you have a nasty distributed consensus problem, along with
read-before-write. Cassandra LWT wasn't available the last time I dealt
with this, but might help with the consensus part today. But there are
still some nasty corner cases.

I have some thoughts on other ways to solve this, but they all have
drawbacks. So I thought I'd ask here and hope that someone has a better
approach.

Thanks in advance,

Jim

Re: Data Modeling: Partition Size and Query Efficiency

Posted by Jim Ancona <ji...@anconafamily.com>.

Hi Jack,

Thanks for your response. My answers inline...

On Tue, Jan 5, 2016 at 11:52 AM, Jack Krupansky <ja...@gmail.com>
wrote:

> Jim, I don't quite get why you think you would need to query 50 partitions
> to return merely hundreds or thousands of rows. Please elaborate. I mean,
> sure, for that extreme 100th percentile, yes, you would query a lot of
> partitions, but for the 90th percentile it would be just one. Even the 99th
> percentile would just be one or at most a few.
>
Exactly, but, as I mentioned in my email, the normal way of segmenting
large partitions is to use some deterministic bucketing mechanism to bucket
rows into different partitions. If you know of a way to make the number of
buckets vary with the number of rows, I'd love to hear about it.

It would help if you could elaborate on the actual access pattern - how
> rapidly is the data coming in and from where. You can do just a little more
> work at the app level and and use Cassandra more effectively.
>
 The write pattern is batches of inserts/updates mixed with some single row
inserts/updates. Not surprisingly, the customers with more data also do
more writes.


> As always, we look to queries to determine what the Cassandra data model
> should look like, so elaborate what your app needs to see. What exactly is
> the app querying for - a single key, a slice, or... what?
>
The use case here is sequential access to some or all or a customer's rows
in order to filter based on other criteria. The order doesn't matter much,
as long as it's well-defined.


> And, as always, you commonly need to store the data in multiple query
> tables so that the data model matches the desired query pattern.
>
> Are the row sizes very dynamic, with some extremely large, or is it just
> the number of rows that is making size an issue?
>
No, row sizes don't vary much, just the number of rows per customer.


>
> Maybe let the app keep a small cache of active partitions and their
> current size so that the app can decide when to switch to a new bucket. Do
> a couple of extra queries when a key is not in that cache to determine what
> the partition size and count to initialize the cache entry for a key. If
> necessary, keep a separate table that tracks the partition size or maybe
> just the (rough) row count to use to determine when a new partition is
> needed.
>

I've done almost exactly what you suggest in a previous application. The
issue is that the cache of active partitions needs to be consistent for
multiple writers and the transition from one bucket to the next really
wants to be transactional. Hence my reference to a "nasty distributed
consensus problem" and Clint's reference to an "anti-pattern". I'd like to
avoid it if I can.

Jim


>
> -- Jack Krupansky
>
> On Tue, Jan 5, 2016 at 11:07 AM, Jim Ancona <ji...@anconafamily.com> wrote:
>
>> Thanks for responding!
>>
>> My natural partition key is a customer id. Our customers have widely
>> varying amounts of data. Since the vast majority of them have data that's
>> small enough to fit in a single partition, I'd like to avoid imposing
>> unnecessary overhead on the 99% just to avoid issues with the largest 1%.
>>
>> The approach to querying across multiple partitions you describe is
>> pretty much what I have in mind. The trick is to avoid having to query 50
>> partitions to return a few hundred or thousand rows.
>>
>> I agree that sequentially filling partitions is something to avoid.
>> That's why I'm hoping someone can suggest a good alternative.
>>
>> Jim
>>
>>
>>
>>
>>
>> On Mon, Jan 4, 2016 at 8:07 PM, Clint Martin <
>> clintlmartin@coolfiretechnologies.com> wrote:
>>
>>> You should endeavor to use a repeatable method of segmenting your data.
>>> Swapping partitions every time you "fill one" seems like an anti pattern to
>>> me. but I suppose it really depends on what your primary key is. Can you
>>> share some more information on this?
>>>
>>> In the past I have utilized the consistent hash method you described
>>> (add an artificial row key segment by modulo some part of the clustering
>>> key by a fixed position count) combined with a lazy evaluation cursor.
>>>
>>> The lazy evaluation cursor essentially is set up to query X number of
>>> partitions simultaneously, but to execute those queries only add needed to
>>> fill the page size. To perform paging you have to know the last primary key
>>> that was returned so you can use that to limit the next iteration.
>>>
>>> You can trade latency for additional work load by controlling the number
>>> of concurrent executions you do as the iterating occurs. Or you can
>>> minimize the work on your cluster by querying each partition one at a time.
>>>
>>> Unfortunately due to the artificial partition key segment you cannot
>>> iterate or page in any particular order...(at least across partitions)
>>> Unless your hash function can also provide you some ordering guarantees.
>>>
>>> It all just depends on your requirements.
>>>
>>> Clint
>>> On Jan 4, 2016 10:13 AM, "Jim Ancona" <ji...@anconafamily.com> wrote:
>>>
>>>> A problem that I have run into repeatedly when doing schema design is
>>>> how to control partition size while still allowing for efficient multi-row
>>>> queries.
>>>>
>>>> We want to limit partition size to some number between 10 and 100
>>>> megabytes to avoid operational issues. The standard way to do that is to
>>>> figure out the maximum number of rows that your "natural partition key"
>>>> will ever need to support and then add an additional artificial partition
>>>> key that segments the rows sufficiently to get keep the partition size
>>>> under the maximum. In the case of time series data, this is often done by
>>>> bucketing by time period, i.e. creating a new partition every minute, hour
>>>> or day. For non-time series data by doing something like
>>>> Hash(clustering-key) mod desired-number-of-partitions.
>>>>
>>>> In my case, multi-row queries to support a REST API typically return a
>>>> page of results, where the page size might be anywhere from a few dozen up
>>>> to thousands. For query efficiency I want the average number of rows per
>>>> partition to be large enough that a query can be satisfied by reading a
>>>> small number of partitions--ideally one.
>>>>
>>>> So I want to simultaneously limit the maximum number of rows per
>>>> partition and yet maintain a large enough average number of rows per
>>>> partition to make my queries efficient. But with my data the ratio between
>>>> maximum and average can be very large (up to four orders of magnitude).
>>>>
>>>> Here is an example:
>>>>
>>>>
>>>> Rows per Partition
>>>>
>>>> Partition Size
>>>>
>>>> Mode
>>>>
>>>> 1
>>>>
>>>> 1 KB
>>>>
>>>> Median
>>>>
>>>> 500
>>>>
>>>> 500 KB
>>>>
>>>> 90th percentile
>>>>
>>>> 5,000
>>>>
>>>> 5 MB
>>>>
>>>> 99th percentile
>>>>
>>>> 50,000
>>>>
>>>> 50 MB
>>>>
>>>> Maximum
>>>>
>>>> 2,500,000
>>>>
>>>> 2.5 GB
>>>>
>>>> In this case, 99% of my data could fit in a single 50 MB partition. But
>>>> if I use the standard approach, I have to split my partitions into 50
>>>> pieces to accommodate the largest data. That means that to query the 700
>>>> rows for my median case, I have to read 50 partitions instead of one.
>>>>
>>>> If you try to deal with this by starting a new partition when an old
>>>> one fills up, you have a nasty distributed consensus problem, along with
>>>> read-before-write. Cassandra LWT wasn't available the last time I dealt
>>>> with this, but might help with the consensus part today. But there are
>>>> still some nasty corner cases.
>>>>
>>>> I have some thoughts on other ways to solve this, but they all have
>>>> drawbacks. So I thought I'd ask here and hope that someone has a better
>>>> approach.
>>>>
>>>> Thanks in advance,
>>>>
>>>> Jim
>>>>
>>>>
>>
>

Re: Data Modeling: Partition Size and Query Efficiency

Posted by Jack Krupansky <ja...@gmail.com>.

Jim, I don't quite get why you think you would need to query 50 partitions
to return merely hundreds or thousands of rows. Please elaborate. I mean,
sure, for that extreme 100th percentile, yes, you would query a lot of
partitions, but for the 90th percentile it would be just one. Even the 99th
percentile would just be one or at most a few.

It would help if you could elaborate on the actual access pattern - how
rapidly is the data coming in and from where. You can do just a little more
work at the app level and and use Cassandra more effectively.

As always, we look to queries to determine what the Cassandra data model
should look like, so elaborate what your app needs to see. What exactly is
the app querying for - a single key, a slice, or... what?

And, as always, you commonly need to store the data in multiple query
tables so that the data model matches the desired query pattern.

Are the row sizes very dynamic, with some extremely large, or is it just
the number of rows that is making size an issue?

Maybe let the app keep a small cache of active partitions and their current
size so that the app can decide when to switch to a new bucket. Do a couple
of extra queries when a key is not in that cache to determine what the
partition size and count to initialize the cache entry for a key. If
necessary, keep a separate table that tracks the partition size or maybe
just the (rough) row count to use to determine when a new partition is
needed.


-- Jack Krupansky

On Tue, Jan 5, 2016 at 11:07 AM, Jim Ancona <ji...@anconafamily.com> wrote:

> Thanks for responding!
>
> My natural partition key is a customer id. Our customers have widely
> varying amounts of data. Since the vast majority of them have data that's
> small enough to fit in a single partition, I'd like to avoid imposing
> unnecessary overhead on the 99% just to avoid issues with the largest 1%.
>
> The approach to querying across multiple partitions you describe is pretty
> much what I have in mind. The trick is to avoid having to query 50
> partitions to return a few hundred or thousand rows.
>
> I agree that sequentially filling partitions is something to avoid. That's
> why I'm hoping someone can suggest a good alternative.
>
> Jim
>
>
>
>
>
> On Mon, Jan 4, 2016 at 8:07 PM, Clint Martin <
> clintlmartin@coolfiretechnologies.com> wrote:
>
>> You should endeavor to use a repeatable method of segmenting your data.
>> Swapping partitions every time you "fill one" seems like an anti pattern to
>> me. but I suppose it really depends on what your primary key is. Can you
>> share some more information on this?
>>
>> In the past I have utilized the consistent hash method you described (add
>> an artificial row key segment by modulo some part of the clustering key by
>> a fixed position count) combined with a lazy evaluation cursor.
>>
>> The lazy evaluation cursor essentially is set up to query X number of
>> partitions simultaneously, but to execute those queries only add needed to
>> fill the page size. To perform paging you have to know the last primary key
>> that was returned so you can use that to limit the next iteration.
>>
>> You can trade latency for additional work load by controlling the number
>> of concurrent executions you do as the iterating occurs. Or you can
>> minimize the work on your cluster by querying each partition one at a time.
>>
>> Unfortunately due to the artificial partition key segment you cannot
>> iterate or page in any particular order...(at least across partitions)
>> Unless your hash function can also provide you some ordering guarantees.
>>
>> It all just depends on your requirements.
>>
>> Clint
>> On Jan 4, 2016 10:13 AM, "Jim Ancona" <ji...@anconafamily.com> wrote:
>>
>>> A problem that I have run into repeatedly when doing schema design is
>>> how to control partition size while still allowing for efficient multi-row
>>> queries.
>>>
>>> We want to limit partition size to some number between 10 and 100
>>> megabytes to avoid operational issues. The standard way to do that is to
>>> figure out the maximum number of rows that your "natural partition key"
>>> will ever need to support and then add an additional artificial partition
>>> key that segments the rows sufficiently to get keep the partition size
>>> under the maximum. In the case of time series data, this is often done by
>>> bucketing by time period, i.e. creating a new partition every minute, hour
>>> or day. For non-time series data by doing something like
>>> Hash(clustering-key) mod desired-number-of-partitions.
>>>
>>> In my case, multi-row queries to support a REST API typically return a
>>> page of results, where the page size might be anywhere from a few dozen up
>>> to thousands. For query efficiency I want the average number of rows per
>>> partition to be large enough that a query can be satisfied by reading a
>>> small number of partitions--ideally one.
>>>
>>> So I want to simultaneously limit the maximum number of rows per
>>> partition and yet maintain a large enough average number of rows per
>>> partition to make my queries efficient. But with my data the ratio between
>>> maximum and average can be very large (up to four orders of magnitude).
>>>
>>> Here is an example:
>>>
>>>
>>> Rows per Partition
>>>
>>> Partition Size
>>>
>>> Mode
>>>
>>> 1
>>>
>>> 1 KB
>>>
>>> Median
>>>
>>> 500
>>>
>>> 500 KB
>>>
>>> 90th percentile
>>>
>>> 5,000
>>>
>>> 5 MB
>>>
>>> 99th percentile
>>>
>>> 50,000
>>>
>>> 50 MB
>>>
>>> Maximum
>>>
>>> 2,500,000
>>>
>>> 2.5 GB
>>>
>>> In this case, 99% of my data could fit in a single 50 MB partition. But
>>> if I use the standard approach, I have to split my partitions into 50
>>> pieces to accommodate the largest data. That means that to query the 700
>>> rows for my median case, I have to read 50 partitions instead of one.
>>>
>>> If you try to deal with this by starting a new partition when an old one
>>> fills up, you have a nasty distributed consensus problem, along with
>>> read-before-write. Cassandra LWT wasn't available the last time I dealt
>>> with this, but might help with the consensus part today. But there are
>>> still some nasty corner cases.
>>>
>>> I have some thoughts on other ways to solve this, but they all have
>>> drawbacks. So I thought I'd ask here and hope that someone has a better
>>> approach.
>>>
>>> Thanks in advance,
>>>
>>> Jim
>>>
>>>
>

Re: Data Modeling: Partition Size and Query Efficiency

Posted by Jim Ancona <ji...@anconafamily.com>.

Thanks for responding!

My natural partition key is a customer id. Our customers have widely
varying amounts of data. Since the vast majority of them have data that's
small enough to fit in a single partition, I'd like to avoid imposing
unnecessary overhead on the 99% just to avoid issues with the largest 1%.

The approach to querying across multiple partitions you describe is pretty
much what I have in mind. The trick is to avoid having to query 50
partitions to return a few hundred or thousand rows.

I agree that sequentially filling partitions is something to avoid. That's
why I'm hoping someone can suggest a good alternative.

Jim





On Mon, Jan 4, 2016 at 8:07 PM, Clint Martin <
clintlmartin@coolfiretechnologies.com> wrote:

> You should endeavor to use a repeatable method of segmenting your data.
> Swapping partitions every time you "fill one" seems like an anti pattern to
> me. but I suppose it really depends on what your primary key is. Can you
> share some more information on this?
>
> In the past I have utilized the consistent hash method you described (add
> an artificial row key segment by modulo some part of the clustering key by
> a fixed position count) combined with a lazy evaluation cursor.
>
> The lazy evaluation cursor essentially is set up to query X number of
> partitions simultaneously, but to execute those queries only add needed to
> fill the page size. To perform paging you have to know the last primary key
> that was returned so you can use that to limit the next iteration.
>
> You can trade latency for additional work load by controlling the number
> of concurrent executions you do as the iterating occurs. Or you can
> minimize the work on your cluster by querying each partition one at a time.
>
> Unfortunately due to the artificial partition key segment you cannot
> iterate or page in any particular order...(at least across partitions)
> Unless your hash function can also provide you some ordering guarantees.
>
> It all just depends on your requirements.
>
> Clint
> On Jan 4, 2016 10:13 AM, "Jim Ancona" <ji...@anconafamily.com> wrote:
>
>> A problem that I have run into repeatedly when doing schema design is how
>> to control partition size while still allowing for efficient multi-row
>> queries.
>>
>> We want to limit partition size to some number between 10 and 100
>> megabytes to avoid operational issues. The standard way to do that is to
>> figure out the maximum number of rows that your "natural partition key"
>> will ever need to support and then add an additional artificial partition
>> key that segments the rows sufficiently to get keep the partition size
>> under the maximum. In the case of time series data, this is often done by
>> bucketing by time period, i.e. creating a new partition every minute, hour
>> or day. For non-time series data by doing something like
>> Hash(clustering-key) mod desired-number-of-partitions.
>>
>> In my case, multi-row queries to support a REST API typically return a
>> page of results, where the page size might be anywhere from a few dozen up
>> to thousands. For query efficiency I want the average number of rows per
>> partition to be large enough that a query can be satisfied by reading a
>> small number of partitions--ideally one.
>>
>> So I want to simultaneously limit the maximum number of rows per
>> partition and yet maintain a large enough average number of rows per
>> partition to make my queries efficient. But with my data the ratio between
>> maximum and average can be very large (up to four orders of magnitude).
>>
>> Here is an example:
>>
>>
>> Rows per Partition
>>
>> Partition Size
>>
>> Mode
>>
>> 1
>>
>> 1 KB
>>
>> Median
>>
>> 500
>>
>> 500 KB
>>
>> 90th percentile
>>
>> 5,000
>>
>> 5 MB
>>
>> 99th percentile
>>
>> 50,000
>>
>> 50 MB
>>
>> Maximum
>>
>> 2,500,000
>>
>> 2.5 GB
>>
>> In this case, 99% of my data could fit in a single 50 MB partition. But
>> if I use the standard approach, I have to split my partitions into 50
>> pieces to accommodate the largest data. That means that to query the 700
>> rows for my median case, I have to read 50 partitions instead of one.
>>
>> If you try to deal with this by starting a new partition when an old one
>> fills up, you have a nasty distributed consensus problem, along with
>> read-before-write. Cassandra LWT wasn't available the last time I dealt
>> with this, but might help with the consensus part today. But there are
>> still some nasty corner cases.
>>
>> I have some thoughts on other ways to solve this, but they all have
>> drawbacks. So I thought I'd ask here and hope that someone has a better
>> approach.
>>
>> Thanks in advance,
>>
>> Jim
>>
>>

Re: Data Modeling: Partition Size and Query Efficiency

Posted by Clint Martin <cl...@coolfiretechnologies.com>.

You should endeavor to use a repeatable method of segmenting your data.
Swapping partitions every time you "fill one" seems like an anti pattern to
me. but I suppose it really depends on what your primary key is. Can you
share some more information on this?

In the past I have utilized the consistent hash method you described (add
an artificial row key segment by modulo some part of the clustering key by
a fixed position count) combined with a lazy evaluation cursor.

The lazy evaluation cursor essentially is set up to query X number of
partitions simultaneously, but to execute those queries only add needed to
fill the page size. To perform paging you have to know the last primary key
that was returned so you can use that to limit the next iteration.

You can trade latency for additional work load by controlling the number of
concurrent executions you do as the iterating occurs. Or you can minimize
the work on your cluster by querying each partition one at a time.

Unfortunately due to the artificial partition key segment you cannot
iterate or page in any particular order...(at least across partitions)
Unless your hash function can also provide you some ordering guarantees.

It all just depends on your requirements.

Clint
On Jan 4, 2016 10:13 AM, "Jim Ancona" <ji...@anconafamily.com> wrote:

> A problem that I have run into repeatedly when doing schema design is how
> to control partition size while still allowing for efficient multi-row
> queries.
>
> We want to limit partition size to some number between 10 and 100
> megabytes to avoid operational issues. The standard way to do that is to
> figure out the maximum number of rows that your "natural partition key"
> will ever need to support and then add an additional artificial partition
> key that segments the rows sufficiently to get keep the partition size
> under the maximum. In the case of time series data, this is often done by
> bucketing by time period, i.e. creating a new partition every minute, hour
> or day. For non-time series data by doing something like
> Hash(clustering-key) mod desired-number-of-partitions.
>
> In my case, multi-row queries to support a REST API typically return a
> page of results, where the page size might be anywhere from a few dozen up
> to thousands. For query efficiency I want the average number of rows per
> partition to be large enough that a query can be satisfied by reading a
> small number of partitions--ideally one.
>
> So I want to simultaneously limit the maximum number of rows per partition
> and yet maintain a large enough average number of rows per partition to
> make my queries efficient. But with my data the ratio between maximum and
> average can be very large (up to four orders of magnitude).
>
> Here is an example:
>
>
> Rows per Partition
>
> Partition Size
>
> Mode
>
> 1
>
> 1 KB
>
> Median
>
> 500
>
> 500 KB
>
> 90th percentile
>
> 5,000
>
> 5 MB
>
> 99th percentile
>
> 50,000
>
> 50 MB
>
> Maximum
>
> 2,500,000
>
> 2.5 GB
>
> In this case, 99% of my data could fit in a single 50 MB partition. But if
> I use the standard approach, I have to split my partitions into 50 pieces
> to accommodate the largest data. That means that to query the 700 rows for
> my median case, I have to read 50 partitions instead of one.
>
> If you try to deal with this by starting a new partition when an old one
> fills up, you have a nasty distributed consensus problem, along with
> read-before-write. Cassandra LWT wasn't available the last time I dealt
> with this, but might help with the consensus part today. But there are
> still some nasty corner cases.
>
> I have some thoughts on other ways to solve this, but they all have
> drawbacks. So I thought I'd ask here and hope that someone has a better
> approach.
>
> Thanks in advance,
>
> Jim
>
>

Re: Data Modeling: Partition Size and Query Efficiency

Posted by Jim Ancona <ji...@anconafamily.com>.

On Tue, Jan 5, 2016 at 5:52 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> You could keep a "num_buckets" value associated with the client's account,
> which can be adjusted accordingly as usage increases.
>

Yes, but the adjustment problem is tricky when there are multiple
concurrent writers. What happens when you change the number of buckets?
Does existing data have to be re-written into new buckets? If so, how do
you make sure that's only done once for each bucket size increase? Or
perhaps I'm misunderstanding your suggestion?

Jim


> On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona <ji...@anconafamily.com> wrote:
>
>> On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin <
>> clintlmartin@coolfiretechnologies.com> wrote:
>>
>>> What sort of data is your clustering key composed of? That might help
>>> some in determining a way to achieve what you're looking for.
>>>
>> Just a UUID that acts as an object identifier.
>>
>>>
>>> Clint
>>> On Jan 5, 2016 2:28 PM, "Jim Ancona" <ji...@anconafamily.com> wrote:
>>>
>>>> Hi Nate,
>>>>
>>>> Yes, I've been thinking about treating customers as either small or
>>>> big, where "small" ones have a single partition and big ones have 50 (or
>>>> whatever number I need to keep sizes reasonable). There's still the problem
>>>> of how to handle a small customer who becomes too big, but that will happen
>>>> much less frequently than a customer filling a partition.
>>>>
>>>> Jim
>>>>
>>>> On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall <na...@thelastpickle.com>
>>>> wrote:
>>>>
>>>>>
>>>>>> In this case, 99% of my data could fit in a single 50 MB partition.
>>>>>> But if I use the standard approach, I have to split my partitions into 50
>>>>>> pieces to accommodate the largest data. That means that to query the 700
>>>>>> rows for my median case, I have to read 50 partitions instead of one.
>>>>>>
>>>>>> If you try to deal with this by starting a new partition when an old
>>>>>> one fills up, you have a nasty distributed consensus problem, along with
>>>>>> read-before-write. Cassandra LWT wasn't available the last time I dealt
>>>>>> with this, but might help with the consensus part today. But there are
>>>>>> still some nasty corner cases.
>>>>>>
>>>>>> I have some thoughts on other ways to solve this, but they all have
>>>>>> drawbacks. So I thought I'd ask here and hope that someone has a better
>>>>>> approach.
>>>>>>
>>>>>>
>>>>> Hi Jim - good to see you around again.
>>>>>
>>>>> If you can segment this upstream by customer/account/whatever,
>>>>> handling the outliers as an entirely different code path (potentially
>>>>> different cluster as the workload will be quite different at that point and
>>>>> have different tuning requirements) would be your best bet. Then a
>>>>> read-before-write makes sense given it is happening on such a small number
>>>>> of API queries.
>>>>>
>>>>>
>>>>> --
>>>>> -----------------
>>>>> Nate McCall
>>>>> Austin, TX
>>>>> @zznate
>>>>>
>>>>> Co-Founder & Sr. Technical Consultant
>>>>> Apache Cassandra Consulting
>>>>> http://www.thelastpickle.com
>>>>>
>>>>
>>>>

Re: Data Modeling: Partition Size and Query Efficiency

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

You could keep a "num_buckets" value associated with the client's account,
which can be adjusted accordingly as usage increases.

On Tue, Jan 5, 2016 at 2:17 PM Jim Ancona <ji...@anconafamily.com> wrote:

> On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin <
> clintlmartin@coolfiretechnologies.com> wrote:
>
>> What sort of data is your clustering key composed of? That might help
>> some in determining a way to achieve what you're looking for.
>>
> Just a UUID that acts as an object identifier.
>
>>
>> Clint
>> On Jan 5, 2016 2:28 PM, "Jim Ancona" <ji...@anconafamily.com> wrote:
>>
>>> Hi Nate,
>>>
>>> Yes, I've been thinking about treating customers as either small or big,
>>> where "small" ones have a single partition and big ones have 50 (or
>>> whatever number I need to keep sizes reasonable). There's still the problem
>>> of how to handle a small customer who becomes too big, but that will happen
>>> much less frequently than a customer filling a partition.
>>>
>>> Jim
>>>
>>> On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall <na...@thelastpickle.com>
>>> wrote:
>>>
>>>>
>>>>> In this case, 99% of my data could fit in a single 50 MB partition.
>>>>> But if I use the standard approach, I have to split my partitions into 50
>>>>> pieces to accommodate the largest data. That means that to query the 700
>>>>> rows for my median case, I have to read 50 partitions instead of one.
>>>>>
>>>>> If you try to deal with this by starting a new partition when an old
>>>>> one fills up, you have a nasty distributed consensus problem, along with
>>>>> read-before-write. Cassandra LWT wasn't available the last time I dealt
>>>>> with this, but might help with the consensus part today. But there are
>>>>> still some nasty corner cases.
>>>>>
>>>>> I have some thoughts on other ways to solve this, but they all have
>>>>> drawbacks. So I thought I'd ask here and hope that someone has a better
>>>>> approach.
>>>>>
>>>>>
>>>> Hi Jim - good to see you around again.
>>>>
>>>> If you can segment this upstream by customer/account/whatever, handling
>>>> the outliers as an entirely different code path (potentially different
>>>> cluster as the workload will be quite different at that point and have
>>>> different tuning requirements) would be your best bet. Then a
>>>> read-before-write makes sense given it is happening on such a small number
>>>> of API queries.
>>>>
>>>>
>>>> --
>>>> -----------------
>>>> Nate McCall
>>>> Austin, TX
>>>> @zznate
>>>>
>>>> Co-Founder & Sr. Technical Consultant
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>
>>>

Re: Data Modeling: Partition Size and Query Efficiency

Posted by Jim Ancona <ji...@anconafamily.com>.

On Tue, Jan 5, 2016 at 4:56 PM, Clint Martin <
clintlmartin@coolfiretechnologies.com> wrote:

> What sort of data is your clustering key composed of? That might help some
> in determining a way to achieve what you're looking for.
>
Just a UUID that acts as an object identifier.

>
> Clint
> On Jan 5, 2016 2:28 PM, "Jim Ancona" <ji...@anconafamily.com> wrote:
>
>> Hi Nate,
>>
>> Yes, I've been thinking about treating customers as either small or big,
>> where "small" ones have a single partition and big ones have 50 (or
>> whatever number I need to keep sizes reasonable). There's still the problem
>> of how to handle a small customer who becomes too big, but that will happen
>> much less frequently than a customer filling a partition.
>>
>> Jim
>>
>> On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall <na...@thelastpickle.com>
>> wrote:
>>
>>>
>>>> In this case, 99% of my data could fit in a single 50 MB partition. But
>>>> if I use the standard approach, I have to split my partitions into 50
>>>> pieces to accommodate the largest data. That means that to query the 700
>>>> rows for my median case, I have to read 50 partitions instead of one.
>>>>
>>>> If you try to deal with this by starting a new partition when an old
>>>> one fills up, you have a nasty distributed consensus problem, along with
>>>> read-before-write. Cassandra LWT wasn't available the last time I dealt
>>>> with this, but might help with the consensus part today. But there are
>>>> still some nasty corner cases.
>>>>
>>>> I have some thoughts on other ways to solve this, but they all have
>>>> drawbacks. So I thought I'd ask here and hope that someone has a better
>>>> approach.
>>>>
>>>>
>>> Hi Jim - good to see you around again.
>>>
>>> If you can segment this upstream by customer/account/whatever, handling
>>> the outliers as an entirely different code path (potentially different
>>> cluster as the workload will be quite different at that point and have
>>> different tuning requirements) would be your best bet. Then a
>>> read-before-write makes sense given it is happening on such a small number
>>> of API queries.
>>>
>>>
>>> --
>>> -----------------
>>> Nate McCall
>>> Austin, TX
>>> @zznate
>>>
>>> Co-Founder & Sr. Technical Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>
>>

Re: Data Modeling: Partition Size and Query Efficiency

Posted by Clint Martin <cl...@coolfiretechnologies.com>.

What sort of data is your clustering key composed of? That might help some
in determining a way to achieve what you're looking for.

Clint
On Jan 5, 2016 2:28 PM, "Jim Ancona" <ji...@anconafamily.com> wrote:

> Hi Nate,
>
> Yes, I've been thinking about treating customers as either small or big,
> where "small" ones have a single partition and big ones have 50 (or
> whatever number I need to keep sizes reasonable). There's still the problem
> of how to handle a small customer who becomes too big, but that will happen
> much less frequently than a customer filling a partition.
>
> Jim
>
> On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall <na...@thelastpickle.com>
> wrote:
>
>>
>>> In this case, 99% of my data could fit in a single 50 MB partition. But
>>> if I use the standard approach, I have to split my partitions into 50
>>> pieces to accommodate the largest data. That means that to query the 700
>>> rows for my median case, I have to read 50 partitions instead of one.
>>>
>>> If you try to deal with this by starting a new partition when an old one
>>> fills up, you have a nasty distributed consensus problem, along with
>>> read-before-write. Cassandra LWT wasn't available the last time I dealt
>>> with this, but might help with the consensus part today. But there are
>>> still some nasty corner cases.
>>>
>>> I have some thoughts on other ways to solve this, but they all have
>>> drawbacks. So I thought I'd ask here and hope that someone has a better
>>> approach.
>>>
>>>
>> Hi Jim - good to see you around again.
>>
>> If you can segment this upstream by customer/account/whatever, handling
>> the outliers as an entirely different code path (potentially different
>> cluster as the workload will be quite different at that point and have
>> different tuning requirements) would be your best bet. Then a
>> read-before-write makes sense given it is happening on such a small number
>> of API queries.
>>
>>
>> --
>> -----------------
>> Nate McCall
>> Austin, TX
>> @zznate
>>
>> Co-Founder & Sr. Technical Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>
>

Re: Data Modeling: Partition Size and Query Efficiency

Posted by Jim Ancona <ji...@anconafamily.com>.

Hi Nate,

Yes, I've been thinking about treating customers as either small or big,
where "small" ones have a single partition and big ones have 50 (or
whatever number I need to keep sizes reasonable). There's still the problem
of how to handle a small customer who becomes too big, but that will happen
much less frequently than a customer filling a partition.

Jim

On Tue, Jan 5, 2016 at 12:21 PM, Nate McCall <na...@thelastpickle.com> wrote:

>
>> In this case, 99% of my data could fit in a single 50 MB partition. But
>> if I use the standard approach, I have to split my partitions into 50
>> pieces to accommodate the largest data. That means that to query the 700
>> rows for my median case, I have to read 50 partitions instead of one.
>>
>> If you try to deal with this by starting a new partition when an old one
>> fills up, you have a nasty distributed consensus problem, along with
>> read-before-write. Cassandra LWT wasn't available the last time I dealt
>> with this, but might help with the consensus part today. But there are
>> still some nasty corner cases.
>>
>> I have some thoughts on other ways to solve this, but they all have
>> drawbacks. So I thought I'd ask here and hope that someone has a better
>> approach.
>>
>>
> Hi Jim - good to see you around again.
>
> If you can segment this upstream by customer/account/whatever, handling
> the outliers as an entirely different code path (potentially different
> cluster as the workload will be quite different at that point and have
> different tuning requirements) would be your best bet. Then a
> read-before-write makes sense given it is happening on such a small number
> of API queries.
>
>
> --
> -----------------
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>

Re: Data Modeling: Partition Size and Query Efficiency

Posted by Nate McCall <na...@thelastpickle.com>.

>
>
> In this case, 99% of my data could fit in a single 50 MB partition. But if
> I use the standard approach, I have to split my partitions into 50 pieces
> to accommodate the largest data. That means that to query the 700 rows for
> my median case, I have to read 50 partitions instead of one.
>
> If you try to deal with this by starting a new partition when an old one
> fills up, you have a nasty distributed consensus problem, along with
> read-before-write. Cassandra LWT wasn't available the last time I dealt
> with this, but might help with the consensus part today. But there are
> still some nasty corner cases.
>
> I have some thoughts on other ways to solve this, but they all have
> drawbacks. So I thought I'd ask here and hope that someone has a better
> approach.
>
>
Hi Jim - good to see you around again.

If you can segment this upstream by customer/account/whatever, handling the
outliers as an entirely different code path (potentially different cluster
as the workload will be quite different at that point and have different
tuning requirements) would be your best bet. Then a read-before-write makes
sense given it is happening on such a small number of API queries.


-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com