You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Jack Krupansky <ja...@gmail.com> on 2016/03/01 06:23:51 UTC

Re: Practical limit on number of column families

3,000 entries? What's an "entry"? Do you mean row, column, or... what?

You are using the obsolete terminology of CQL2 and Thrift - column family.
With CQL3 you should be creating "tables". The practical recommendation of
an upper limit of a few hundred tables across all key spaces remains.

Technically you can go higher and technically you can reduce the overhead
per table (an undocumented Jira - intentionally undocumented since it is
strongly not recommended), but... it is unlikely that you will be happy
with the results.

What is the nature of the use case?

You basically have two choices: an additional cluster column to distinguish
categories of table, or separate clusters for each few hundred of tables.

-- Jack Krupansky

On Mon, Feb 29, 2016 at 12:30 PM, Fernando Jimenez <
fernando.jimenez@wealth-port.com> wrote:

> Hi all
>
> I have a use case for Cassandra that would require creating a large number
> of column families. I have found references to early versions of Cassandra
> where each column family would require a fixed amount of memory on all
> nodes, effectively imposing an upper limit on the total number of CFs. I
> have also seen rumblings that this may have been fixed in later versions.
>
> To put the question to rest, I have setup a DSE sandbox and created some
> code to generate column families populated with 3,000 entries each.
>
> Unfortunately I have now hit this issue:
> https://issues.apache.org/jira/browse/CASSANDRA-9291
>
> So I will have to retest against Cassandra 3.0 instead
>
> However, I would like to understand the limitations regarding creation of
> column families.
>
> * Is there a practical upper limit?
> * is this a fixed limit, or does it scale as more nodes are added into the
> cluster?
> * Is there a difference between one keyspace with thousands of column
> families, vs thousands of keyspaces with only a few column families each?
>
> I haven’t found any hard evidence/documentation to help me here, but if
> you can point me in the right direction, I will oblige and RTFM away.
>
> Many thanks for your help!
>
> Cheers
> FJ
>
>
>

Re: Practical limit on number of column families

Posted by Vlad <qa...@yahoo.com>.

>If your Jira search fu is strong enoughAnd it is! )

>you should be able to find it yourselfAnd I did! )
I see that this issue originates to problem with Java GC's design, but according to date it was Java 6 time. Now we have J8 with new GC mechanism.
Is this problem still exists with J8? Any chances to use original method to reduce overhead and "be happy with the results"?
Regards, Vlad

On Tuesday, March 1, 2016 4:07 PM, Jack Krupansky <ja...@gmail.com> wrote:

I'll defer to one of the senior committers as to whether they want that information disseminated any further than it already is. It was intentionally not documented since it is not recommended. If your Jira search fu is strong enough you should be able to find it yourself, but again, its use is strongly not recommended.
As the Jira notes, "having more than dozens or hundreds of tables defined is almost certainly a Bad Idea."
"Bad Idea" means not good. As in don't go there. And if you do, don't expect such a mis-adventure to be supported by the community.
-- Jack Krupansky
On Tue, Mar 1, 2016 at 8:39 AM, Vlad <qa...@yahoo.com> wrote:

Hi Jack,
>you can reduce the overhead per table an undocumented Jira Can you please point to this Jira number?

>it is strongly not recommendedWhat is consequences of this (besides performance degradation, if any)?
Thanks.

On Tuesday, March 1, 2016 7:23 AM, Jack Krupansky <ja...@gmail.com> wrote:

3,000 entries? What's an "entry"? Do you mean row, column, or... what?

You are using the obsolete terminology of CQL2 and Thrift - column family. With CQL3 you should be creating "tables". The practical recommendation of an upper limit of a few hundred tables across all key spaces remains.
Technically you can go higher and technically you can reduce the overhead per table (an undocumented Jira - intentionally undocumented since it is strongly not recommended), but... it is unlikely that you will be happy with the results.
What is the nature of the use case?
You basically have two choices: an additional cluster column to distinguish categories of table, or separate clusters for each few hundred of tables.

-- Jack Krupansky
On Mon, Feb 29, 2016 at 12:30 PM, Fernando Jimenez <fe...@wealth-port.com> wrote:

Hi all
I have a use case for Cassandra that would require creating a large number of column families. I have found references to early versions of Cassandra where each column family would require a fixed amount of memory on all nodes, effectively imposing an upper limit on the total number of CFs. I have also seen rumblings that this may have been fixed in later versions.
To put the question to rest, I have setup a DSE sandbox and created some code to generate column families populated with 3,000 entries each.
Unfortunately I have now hit this issue: https://issues.apache.org/jira/browse/CASSANDRA-9291
So I will have to retest against Cassandra 3.0 instead
However, I would like to understand the limitations regarding creation of column families.
* Is there a practical upper limit? * is this a fixed limit, or does it scale as more nodes are added into the cluster? * Is there a difference between one keyspace with thousands of column families, vs thousands of keyspaces with only a few column families each?
I haven’t found any hard evidence/documentation to help me here, but if you can point me in the right direction, I will oblige and RTFM away.
Many thanks for your help!
CheersFJ