You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Kant Kodali <ka...@peernova.com> on 2017/05/10 21:54:15 UTC

Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?

Hi All,

Cassandra community had always been recommending 100MB per partition as a
sweet spot however does this limitation still exist given there is a B-tree
implementation to identify rows inside a partition?

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/rows/BTreeRow.java

Thanks!

Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?

Posted by Michael Kjellman <mk...@internalcircle.com>.

I'm almost done with a rebased trunk patch. Hit a few snags. I want nothing more to finish this thing... The latest issue was due to range tombstones and the fact that the deletion time was being stored in the index from 3.0 onwards. I hope to have everything pushed very shortly. Sorry for the delay, I'm doing my best... there is never enough hours in the day. :)

best,
kjellman 

> On May 11, 2017, at 1:48 AM, Kant Kodali <ka...@peernova.com> wrote:
> 
> oh this looks like one I am looking for
> https://issues.apache.org/jira/browse/CASSANDRA-9754. Is this in Cassandra
> 3.10 or merged somewhere?
> 
> On Thu, May 11, 2017 at 1:13 AM, Kant Kodali <ka...@peernova.com> wrote:
> 
>> Hi DuyHai,
>> 
>> I am trying to see what are the possible things we can do to get over this
>> limitation?
>> 
>> 1. Would this https://issues.apache.org/jira/browse/CASSANDRA-7447 help
>> at all?
>> 2. Can we have Merkle trees built for groups of rows in partition ? such
>> that we can stream only those groups where the hash is different?
>> 3. It would be interesting to see if we can spread a partition across
>> nodes.
>> 
>> I am just trying to validate some ideas that can help potentially get over
>> this 100MB limitation since we may not always fit into a time series model.
>> 
>> Thanks!
>> 
>> On Thu, May 11, 2017 at 12:37 AM, DuyHai Doan <do...@gmail.com>
>> wrote:
>> 
>>> Yes the recommendation still applies
>>> 
>>> Wide partitions have huge impact on repair (over streaming), compaction
>>> and bootstrap
>>> 
>>> Le 10 mai 2017 23:54, "Kant Kodali" <ka...@peernova.com> a écrit :
>>> 
>>> Hi All,
>>> 
>>> Cassandra community had always been recommending 100MB per partition as a
>>> sweet spot however does this limitation still exist given there is a
>>> B-tree
>>> implementation to identify rows inside a partition?
>>> 
>>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>>> apache/cassandra/db/rows/BTreeRow.java
>>> 
>>> Thanks!
>>> 
>>> 
>>> 
>>

Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?

Posted by Kant Kodali <ka...@peernova.com>.

oh this looks like one I am looking for
https://issues.apache.org/jira/browse/CASSANDRA-9754. Is this in Cassandra
3.10 or merged somewhere?

On Thu, May 11, 2017 at 1:13 AM, Kant Kodali <ka...@peernova.com> wrote:

> Hi DuyHai,
>
> I am trying to see what are the possible things we can do to get over this
> limitation?
>
> 1. Would this https://issues.apache.org/jira/browse/CASSANDRA-7447 help
> at all?
> 2. Can we have Merkle trees built for groups of rows in partition ? such
> that we can stream only those groups where the hash is different?
> 3. It would be interesting to see if we can spread a partition across
> nodes.
>
> I am just trying to validate some ideas that can help potentially get over
> this 100MB limitation since we may not always fit into a time series model.
>
> Thanks!
>
> On Thu, May 11, 2017 at 12:37 AM, DuyHai Doan <do...@gmail.com>
> wrote:
>
>> Yes the recommendation still applies
>>
>> Wide partitions have huge impact on repair (over streaming), compaction
>> and bootstrap
>>
>> Le 10 mai 2017 23:54, "Kant Kodali" <ka...@peernova.com> a écrit :
>>
>> Hi All,
>>
>> Cassandra community had always been recommending 100MB per partition as a
>> sweet spot however does this limitation still exist given there is a
>> B-tree
>> implementation to identify rows inside a partition?
>>
>> https://github.com/apache/cassandra/blob/trunk/src/java/org/
>> apache/cassandra/db/rows/BTreeRow.java
>>
>> Thanks!
>>
>>
>>
>

Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?

Posted by Kant Kodali <ka...@peernova.com>.

Hi DuyHai,

I am trying to see what are the possible things we can do to get over this
limitation?

1. Would this https://issues.apache.org/jira/browse/CASSANDRA-7447 help at
all?
2. Can we have Merkle trees built for groups of rows in partition ? such
that we can stream only those groups where the hash is different?
3. It would be interesting to see if we can spread a partition across nodes.

I am just trying to validate some ideas that can help potentially get over
this 100MB limitation since we may not always fit into a time series model.

Thanks!

On Thu, May 11, 2017 at 12:37 AM, DuyHai Doan <do...@gmail.com> wrote:

> Yes the recommendation still applies
>
> Wide partitions have huge impact on repair (over streaming), compaction
> and bootstrap
>
> Le 10 mai 2017 23:54, "Kant Kodali" <ka...@peernova.com> a écrit :
>
> Hi All,
>
> Cassandra community had always been recommending 100MB per partition as a
> sweet spot however does this limitation still exist given there is a B-tree
> implementation to identify rows inside a partition?
>
> https://github.com/apache/cassandra/blob/trunk/src/java/org/
> apache/cassandra/db/rows/BTreeRow.java
>
> Thanks!
>
>
>

Re: Does partition size limitation still exists in Cassandra 3.10 given there is a B-tree implementation?

Posted by DuyHai Doan <do...@gmail.com>.

Yes the recommendation still applies

Wide partitions have huge impact on repair (over streaming), compaction and
bootstrap

Le 10 mai 2017 23:54, "Kant Kodali" <ka...@peernova.com> a écrit :

Hi All,

Cassandra community had always been recommending 100MB per partition as a
sweet spot however does this limitation still exist given there is a B-tree
implementation to identify rows inside a partition?

https://github.com/apache/cassandra/blob/trunk/src/java/
org/apache/cassandra/db/rows/BTreeRow.java

Thanks!