You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by tommaso barbugli <tb...@gmail.com> on 2014/07/02 10:13:41 UTC
keyspace with hundreds of columnfamilies
Hi,
Are there any known issues, shortcomings about organising data in hundreds
of column families?
At this present I am running with 300 column families but I expect that to
get to a couple of thousands.
Is this something discouraged / unsupported (I am using Cassandra 2.0).
Thanks
Tommaso
Re: keyspace with hundreds of columnfamilies
Posted by tommaso barbugli <tb...@gmail.com>.
Hi,
I am building a sort of db as a service (more, one db table as a service)
and I want every user to have their own storage as much isolated as
possible (and give them some freedom in terms of schema customisation and
the ability to build 2i indexes).
You know what kind of memory cost we are talking about? Is this going to
influence clients (I am using DS python driver) too? (eg. flooding them
with meta data)
Thanks
Tommaso
2014-07-02 13:22 GMT+02:00 Jonathan Lacefield <jl...@datastax.com>:
> Hello
> There is overhead for memory with each col family. This type of
> configuration could cause heap issues. What is driving the
> requirement for so many Cfs?
>
> > On Jul 2, 2014, at 4:14 AM, tommaso barbugli <tb...@gmail.com>
> wrote:
> >
> > Hi,
> > Are there any known issues, shortcomings about organising data in
> hundreds of column families?
> > At this present I am running with 300 column families but I expect that
> to get to a couple of thousands.
> > Is this something discouraged / unsupported (I am using Cassandra 2.0).
> >
> > Thanks
> > Tommaso
>
Re: keyspace with hundreds of columnfamilies
Posted by Jonathan Lacefield <jl...@datastax.com>.
Hello
There is overhead for memory with each col family. This type of
configuration could cause heap issues. What is driving the
requirement for so many Cfs?
> On Jul 2, 2014, at 4:14 AM, tommaso barbugli <tb...@gmail.com> wrote:
>
> Hi,
> Are there any known issues, shortcomings about organising data in hundreds of column families?
> At this present I am running with 300 column families but I expect that to get to a couple of thousands.
> Is this something discouraged / unsupported (I am using Cassandra 2.0).
>
> Thanks
> Tommaso
Re: keyspace with hundreds of columnfamilies
Posted by Jack Krupansky <ja...@basetechnology.com>.
If your 1K tables might grow to 5 or 10K, then doesn’t that mean you would be trying to add columns, later, after you’ve populated your data? If so, that would argue for using one or more map columns, to accommodate the dynamic addition of pseudo-columns.
Once again, look at your queries (as they would be today and as in the future as you expand the data) since they will be your ultimate guide as to how to model your data.
And drill deeper into how you will be inserting and updating the data in “groups” – that will guide the data modeling as well. What will the typical update use cases look like?
By all means, start simple, but also be careful not to paint yourself into a corner. In the alternative, be prepared to throw away entire implementations as your conceptualization of the data evolves.
-- Jack Krupansky
From: tommaso barbugli
Sent: Saturday, July 12, 2014 3:12 PM
To: user@cassandra.apache.org
Subject: Re: keyspace with hundreds of columnfamilies
hi Jack
thank you for your clear answer!
On Saturday, 12 July 2014, Jack Krupansky <ja...@basetechnology.com> wrote:
1. What does your data look like – 100 small integers or short strings and dates, or... 100 massive blobs?
it will be only small short strings/varints no blobs or nested data
2. What operations are you doing on those rows – reading and updating individual columns, or mostly full-row upserts?
mostly read write grops of columns (previously i had those set of columns in different CFs)
3. 100 columns in a CQL row is not so unreasonable, per se.
4. The ultimate answer to any “how will it perform” question is to do a “proof of concept” implementation since it really all depends on your actual data and hardware setup, such as memory, cpu, I/O, and networking – IOW, all the non-Cassandra factors can easily dwarf Cassandra itself.
5. As far as 1K tables with 10 columns vs. 100 tables with 100 columns – it should primarily be your queries (and updates) that drive the decision. Do fewer tables and more columns make your queries (and updates) a lot simpler and cleaner?
yes code-wise it does; i am just scared that i will get into some bad situation problem when 1k CFs will grow to 5 or 10k
-- Jack Krupansky
From: tommaso barbugli
Sent: Saturday, July 12, 2014 7:58 AM
To: javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');
Subject: Re: keyspace with hundreds of columnfamilies
hi,
how is a table with hundreds columns is going to perform?
i am moving from 1k column families each with 10 columns to 100 CFs each with 100 columns.
thank you
tommaso
On Friday, 11 July 2014, Sourabh Agrawal <javascript:_e(%7B%7D,'cvml','iitr.sourabh@gmail.com');> wrote:
Yes, what about CQL style columns? Please clarify
On Sat, Jul 5, 2014 at 12:32 PM, tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');> wrote:
Yes my question what about CQL-style columns.
2014-07-04 12:40 GMT+02:00 Jens Rantil <javascript:_e(%7B%7D,'cvml','jens.rantil@tink.se');>:
Just so you guys aren't misunderstanding each other; Tommaso, you were not refering to CQL-style columns, right?
/J
On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <javascript:_e(%7B%7D,'cvml','romain.hardouin@urssaf.fr');> wrote:
Cassandra can handle many more columns (e.g. time series).
So 100 columns is OK.
Best,
Romain
tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');> a écrit sur 03/07/2014 21:55:18 :
> De : tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>
> A : javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');,
> Date : 03/07/2014 21:55
> Objet : Re: keyspace with hundreds of columnfamilies
>
> thank you for the replies; I am rethinking the schema design, one
> possible solution is to "implode" one dimension and get N times less CFs.
> With this approach I would come up with (cql) tables with up to 100
> columns; would that be a problem?
>
> Thank You,
> Tommaso
>
--
Sourabh Agrawal
Bangalore
+91 9945657973
--
sent from iphone (sorry for the typos)
--
sent from iphone (sorry for the typos)
Re: keyspace with hundreds of columnfamilies
Posted by tommaso barbugli <tb...@gmail.com>.
hi Jack
thank you for your clear answer!
On Saturday, 12 July 2014, Jack Krupansky <ja...@basetechnology.com> wrote:
> 1. What does your data look like – 100 small integers or short strings
> and dates, or... 100 massive blobs?
>
it will be only small short strings/varints no blobs or nested data
> 2. What operations are you doing on those rows – reading and updating
> individual columns, or mostly full-row upserts?
>
mostly read write grops of columns (previously i had those set of columns
in different CFs)
>
> 3. 100 columns in a CQL row is not so unreasonable, per se.
>
> 4. The ultimate answer to any “how will it perform” question is to do a
> “proof of concept” implementation since it really all depends on your
> actual data and hardware setup, such as memory, cpu, I/O, and networking –
> IOW, all the non-Cassandra factors can easily dwarf Cassandra itself.
>
> 5. As far as 1K tables with 10 columns vs. 100 tables with 100 columns –
> it should primarily be your queries (and updates) that drive the decision.
> Do fewer tables and more columns make your queries (and updates) a lot
> simpler and cleaner?
>
yes code-wise it does; i am just scared that i will get into some bad
situation problem when 1k CFs will grow to 5 or 10k
>
> -- Jack Krupansky
>
> *From:* tommaso barbugli
> <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>
> *Sent:* Saturday, July 12, 2014 7:58 AM
> *To:* user@cassandra.apache.org
> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>
> *Subject:* Re: keyspace with hundreds of columnfamilies
>
> hi,
> how is a table with hundreds columns is going to perform?
>
> i am moving from 1k column families each with 10 columns to 100 CFs each
> with 100 columns.
>
> thank you
> tommaso
>
> On Friday, 11 July 2014, Sourabh Agrawal <iitr.sourabh@gmail.com
> <javascript:_e(%7B%7D,'cvml','iitr.sourabh@gmail.com');>> wrote:
>
>> Yes, what about CQL style columns? Please clarify
>>
>>
>> On Sat, Jul 5, 2014 at 12:32 PM, tommaso barbugli <
>> javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');> wrote:
>>
>>> Yes my question what about CQL-style columns.
>>>
>>>
>>> 2014-07-04 12:40 GMT+02:00 Jens Rantil <
>>> javascript:_e(%7B%7D,'cvml','jens.rantil@tink.se');>:
>>>
>>> Just so you guys aren't misunderstanding each other; Tommaso, you were
>>>> not refering to CQL-style columns, right?
>>>>
>>>> /J
>>>>
>>>>
>>>> On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <
>>>> javascript:_e(%7B%7D,'cvml','romain.hardouin@urssaf.fr');> wrote:
>>>>
>>>>> Cassandra can handle many more columns (e.g. time series).
>>>>> So 100 columns is OK.
>>>>>
>>>>> Best,
>>>>> Romain
>>>>>
>>>>>
>>>>>
>>>>> tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>
>>>>> a écrit sur 03/07/2014 21:55:18 :
>>>>>
>>>>> > De : tommaso barbugli <
>>>>> javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>
>>>>> > A : javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');,
>>>>> > Date : 03/07/2014 21:55
>>>>> > Objet : Re: keyspace with hundreds of columnfamilies
>>>>> >
>>>>> > thank you for the replies; I am rethinking the schema design, one
>>>>> > possible solution is to "implode" one dimension and get N times less
>>>>> CFs.
>>>>>
>>>>> > With this approach I would come up with (cql) tables with up to
>>>>> 100
>>>>> > columns; would that be a problem?
>>>>> >
>>>>> > Thank You,
>>>>> > Tommaso
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Sourabh Agrawal
>> Bangalore
>> +91 9945657973
>>
>
>
> --
> sent from iphone (sorry for the typos)
>
--
sent from iphone (sorry for the typos)
Re: keyspace with hundreds of columnfamilies
Posted by Jack Krupansky <ja...@basetechnology.com>.
1. What does your data look like – 100 small integers or short strings and dates, or... 100 massive blobs?
2. What operations are you doing on those rows – reading and updating individual columns, or mostly full-row upserts?
3. 100 columns in a CQL row is not so unreasonable, per se.
4. The ultimate answer to any “how will it perform” question is to do a “proof of concept” implementation since it really all depends on your actual data and hardware setup, such as memory, cpu, I/O, and networking – IOW, all the non-Cassandra factors can easily dwarf Cassandra itself.
5. As far as 1K tables with 10 columns vs. 100 tables with 100 columns – it should primarily be your queries (and updates) that drive the decision. Do fewer tables and more columns make your queries (and updates) a lot simpler and cleaner?
-- Jack Krupansky
From: tommaso barbugli
Sent: Saturday, July 12, 2014 7:58 AM
To: user@cassandra.apache.org
Subject: Re: keyspace with hundreds of columnfamilies
hi,
how is a table with hundreds columns is going to perform?
i am moving from 1k column families each with 10 columns to 100 CFs each with 100 columns.
thank you
tommaso
On Friday, 11 July 2014, Sourabh Agrawal <ii...@gmail.com> wrote:
Yes, what about CQL style columns? Please clarify
On Sat, Jul 5, 2014 at 12:32 PM, tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');> wrote:
Yes my question what about CQL-style columns.
2014-07-04 12:40 GMT+02:00 Jens Rantil <javascript:_e(%7B%7D,'cvml','jens.rantil@tink.se');>:
Just so you guys aren't misunderstanding each other; Tommaso, you were not refering to CQL-style columns, right?
/J
On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <javascript:_e(%7B%7D,'cvml','romain.hardouin@urssaf.fr');> wrote:
Cassandra can handle many more columns (e.g. time series).
So 100 columns is OK.
Best,
Romain
tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');> a écrit sur 03/07/2014 21:55:18 :
> De : tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>
> A : javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');,
> Date : 03/07/2014 21:55
> Objet : Re: keyspace with hundreds of columnfamilies
>
> thank you for the replies; I am rethinking the schema design, one
> possible solution is to "implode" one dimension and get N times less CFs.
> With this approach I would come up with (cql) tables with up to 100
> columns; would that be a problem?
>
> Thank You,
> Tommaso
>
--
Sourabh Agrawal
Bangalore
+91 9945657973
--
sent from iphone (sorry for the typos)
Re: keyspace with hundreds of columnfamilies
Posted by tommaso barbugli <tb...@gmail.com>.
hi,
how is a table with hundreds columns is going to perform?
i am moving from 1k column families each with 10 columns to 100 CFs each
with 100 columns.
thank you
tommaso
On Friday, 11 July 2014, Sourabh Agrawal <ii...@gmail.com> wrote:
> Yes, what about CQL style columns? Please clarify
>
>
> On Sat, Jul 5, 2014 at 12:32 PM, tommaso barbugli <tbarbugli@gmail.com
> <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>> wrote:
>
>> Yes my question what about CQL-style columns.
>>
>>
>> 2014-07-04 12:40 GMT+02:00 Jens Rantil <jens.rantil@tink.se
>> <javascript:_e(%7B%7D,'cvml','jens.rantil@tink.se');>>:
>>
>> Just so you guys aren't misunderstanding each other; Tommaso, you were
>>> not refering to CQL-style columns, right?
>>>
>>> /J
>>>
>>>
>>> On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <
>>> romain.hardouin@urssaf.fr
>>> <javascript:_e(%7B%7D,'cvml','romain.hardouin@urssaf.fr');>> wrote:
>>>
>>>> Cassandra can handle many more columns (e.g. time series).
>>>> So 100 columns is OK.
>>>>
>>>> Best,
>>>> Romain
>>>>
>>>>
>>>>
>>>> tommaso barbugli <tbarbugli@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>> a écrit sur
>>>> 03/07/2014 21:55:18 :
>>>>
>>>> > De : tommaso barbugli <tbarbugli@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>>
>>>> > A : user@cassandra.apache.org
>>>> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>,
>>>> > Date : 03/07/2014 21:55
>>>> > Objet : Re: keyspace with hundreds of columnfamilies
>>>> >
>>>> > thank you for the replies; I am rethinking the schema design, one
>>>> > possible solution is to "implode" one dimension and get N times less
>>>> CFs.
>>>>
>>>> > With this approach I would come up with (cql) tables with up to 100
>>>> > columns; would that be a problem?
>>>> >
>>>> > Thank You,
>>>> > Tommaso
>>>> >
>>>>
>>>
>>>
>>
>
>
> --
> Sourabh Agrawal
> Bangalore
> +91 9945657973
>
--
sent from iphone (sorry for the typos)
Re: keyspace with hundreds of columnfamilies
Posted by Sourabh Agrawal <ii...@gmail.com>.
Yes, what about CQL style columns? Please clarify
On Sat, Jul 5, 2014 at 12:32 PM, tommaso barbugli <tb...@gmail.com>
wrote:
> Yes my question what about CQL-style columns.
>
>
> 2014-07-04 12:40 GMT+02:00 Jens Rantil <je...@tink.se>:
>
> Just so you guys aren't misunderstanding each other; Tommaso, you were not
>> refering to CQL-style columns, right?
>>
>> /J
>>
>>
>> On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <
>> romain.hardouin@urssaf.fr> wrote:
>>
>>> Cassandra can handle many more columns (e.g. time series).
>>> So 100 columns is OK.
>>>
>>> Best,
>>> Romain
>>>
>>>
>>>
>>> tommaso barbugli <tb...@gmail.com> a écrit sur 03/07/2014 21:55:18 :
>>>
>>> > De : tommaso barbugli <tb...@gmail.com>
>>> > A : user@cassandra.apache.org,
>>> > Date : 03/07/2014 21:55
>>> > Objet : Re: keyspace with hundreds of columnfamilies
>>> >
>>> > thank you for the replies; I am rethinking the schema design, one
>>> > possible solution is to "implode" one dimension and get N times less
>>> CFs.
>>>
>>> > With this approach I would come up with (cql) tables with up to 100
>>> > columns; would that be a problem?
>>> >
>>> > Thank You,
>>> > Tommaso
>>> >
>>>
>>
>>
>
--
Sourabh Agrawal
Bangalore
+91 9945657973
Re: keyspace with hundreds of columnfamilies
Posted by tommaso barbugli <tb...@gmail.com>.
Yes my question what about CQL-style columns.
2014-07-04 12:40 GMT+02:00 Jens Rantil <je...@tink.se>:
> Just so you guys aren't misunderstanding each other; Tommaso, you were not
> refering to CQL-style columns, right?
>
> /J
>
>
> On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <
> romain.hardouin@urssaf.fr> wrote:
>
>> Cassandra can handle many more columns (e.g. time series).
>> So 100 columns is OK.
>>
>> Best,
>> Romain
>>
>>
>>
>> tommaso barbugli <tb...@gmail.com> a écrit sur 03/07/2014 21:55:18 :
>>
>> > De : tommaso barbugli <tb...@gmail.com>
>> > A : user@cassandra.apache.org,
>> > Date : 03/07/2014 21:55
>> > Objet : Re: keyspace with hundreds of columnfamilies
>> >
>> > thank you for the replies; I am rethinking the schema design, one
>> > possible solution is to "implode" one dimension and get N times less
>> CFs.
>>
>> > With this approach I would come up with (cql) tables with up to 100
>> > columns; would that be a problem?
>> >
>> > Thank You,
>> > Tommaso
>> >
>>
>
>
Re: keyspace with hundreds of columnfamilies
Posted by Jens Rantil <je...@tink.se>.
Just so you guys aren't misunderstanding each other; Tommaso, you were not
refering to CQL-style columns, right?
/J
On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <ro...@urssaf.fr>
wrote:
> Cassandra can handle many more columns (e.g. time series).
> So 100 columns is OK.
>
> Best,
> Romain
>
>
>
> tommaso barbugli <tb...@gmail.com> a écrit sur 03/07/2014 21:55:18 :
>
> > De : tommaso barbugli <tb...@gmail.com>
> > A : user@cassandra.apache.org,
> > Date : 03/07/2014 21:55
> > Objet : Re: keyspace with hundreds of columnfamilies
> >
> > thank you for the replies; I am rethinking the schema design, one
> > possible solution is to "implode" one dimension and get N times less CFs.
>
> > With this approach I would come up with (cql) tables with up to 100
> > columns; would that be a problem?
> >
> > Thank You,
> > Tommaso
> >
>
Re: keyspace with hundreds of columnfamilies
Posted by Romain HARDOUIN <ro...@urssaf.fr>.
Cassandra can handle many more columns (e.g. time series).
So 100 columns is OK.
Best,
Romain
tommaso barbugli <tb...@gmail.com> a écrit sur 03/07/2014 21:55:18 :
> De : tommaso barbugli <tb...@gmail.com>
> A : user@cassandra.apache.org,
> Date : 03/07/2014 21:55
> Objet : Re: keyspace with hundreds of columnfamilies
>
> thank you for the replies; I am rethinking the schema design, one
> possible solution is to "implode" one dimension and get N times less
CFs.
> With this approach I would come up with (cql) tables with up to 100
> columns; would that be a problem?
>
> Thank You,
> Tommaso
>
Re: keyspace with hundreds of columnfamilies
Posted by Ilya Sviridov <is...@mirantis.com>.
Tommaso, looking at your description of the architecture the idea came up.
You can perform sharding on cassandra client and write to different
cassandra clusters to keep the number of column families reasonable.
With best regards,
Ilya
On Thu, Jul 3, 2014 at 10:55 PM, tommaso barbugli <tb...@gmail.com>
wrote:
> thank you for the replies; I am rethinking the schema design, one possible
> solution is to "implode" one dimension and get N times less CFs.
> With this approach I would come up with (cql) tables with up to 100
> columns; would that be a problem?
>
> Thank You,
> Tommaso
>
>
> 2014-07-02 23:43 GMT+02:00 Jack Krupansky <ja...@basetechnology.com>:
>
> The official answer, engraved in stone tablets, and carried down from
>> the mountain: “Although having more than dozens or hundreds of tables
>> defined is almost certainly a Bad Idea (just as it is a design smell in a
>> relational database), it's relatively straightforward to allow disabling
>> the SlabAllocator.” Emphasis on “almost certainly a Bad Idea.”
>>
>> See:
>> https://issues.apache.org/jira/browse/CASSANDRA-5935
>> “Allow disabling slab allocation”
>>
>> IOW, this is considered an anti-pattern, but...
>>
>> -- Jack Krupansky
>>
>> *From:* tommaso barbugli <tb...@gmail.com>
>> *Sent:* Wednesday, July 2, 2014 2:16 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: keyspace with hundreds of columnfamilies
>>
>> Hi,
>> thank you for you replies on this; regarding the arena memory is this a
>> fixed memory allocation or is some sort of in memory caching? I ask because
>> I think that a substantial portion of the column families created will not
>> be queried that frequently (and some will become inactive and stay like
>> that really long time)
>>
>> Thank you,
>> Tommaso
>>
>>
>> 2014-07-02 18:35 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
>>
>>> Arena allocation is an improvement feature, not a limitation.
>>> It was introduced in Cassandra 1.0 in order to lower memory
>>> fragmentation (and therefore promotion failure).
>>> AFAIK It's not intended to be tweaked so it might not be a good idea to
>>> change it.
>>>
>>> Best,
>>> Romain
>>>
>>> tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 17:40:18 :
>>>
>>> > De : tommaso barbugli <tb...@gmail.com>
>>> > A : user@cassandra.apache.org,
>>> > Date : 02/07/2014 17:40
>>> > Objet : Re: keyspace with hundreds of columnfamilies
>>> >
>>> > 1MB per column family sounds pretty bad to me; is this something I
>>> > can tweak/workaround somehow?
>>> >
>>> > Thanks
>>> > Tommaso
>>> >
>>>
>>> > 2014-07-02 17:21 GMT+02:00 Romain HARDOUIN <romain.hardouin@urssaf.fr
>>> >:
>>> > The trap is that each CF will consume 1 MB of memory due to arena
>>> allocation.
>>> > This might seem harmless but if you plan thousands of CF it means
>>> > thousands of mega bytes...
>>> > Up to 1,000 CF I think it could be doable, but not 10,000.
>>> >
>>> > Best,
>>> >
>>> > Romain
>>> >
>>> >
>>> > tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014
>>> 10:13:41 :
>>> >
>>> > > De : tommaso barbugli <tb...@gmail.com>
>>> > > A : user@cassandra.apache.org,
>>> > > Date : 02/07/2014 10:14
>>> > > Objet : keyspace with hundreds of columnfamilies
>>> > >
>>> > > Hi,
>>> > > Are there any known issues, shortcomings about organising data in
>>> > > hundreds of column families?
>>> > > At this present I am running with 300 column families but I expect
>>> > > that to get to a couple of thousands.
>>> > > Is this something discouraged / unsupported (I am using Cassandra
>>> 2.0).
>>> > >
>>> > > Thanks
>>> > > Tommaso
>>>
>>
>>
>
>
Re: keyspace with hundreds of columnfamilies
Posted by tommaso barbugli <tb...@gmail.com>.
thank you for the replies; I am rethinking the schema design, one possible
solution is to "implode" one dimension and get N times less CFs.
With this approach I would come up with (cql) tables with up to 100
columns; would that be a problem?
Thank You,
Tommaso
2014-07-02 23:43 GMT+02:00 Jack Krupansky <ja...@basetechnology.com>:
> The official answer, engraved in stone tablets, and carried down from
> the mountain: “Although having more than dozens or hundreds of tables
> defined is almost certainly a Bad Idea (just as it is a design smell in a
> relational database), it's relatively straightforward to allow disabling
> the SlabAllocator.” Emphasis on “almost certainly a Bad Idea.”
>
> See:
> https://issues.apache.org/jira/browse/CASSANDRA-5935
> “Allow disabling slab allocation”
>
> IOW, this is considered an anti-pattern, but...
>
> -- Jack Krupansky
>
> *From:* tommaso barbugli <tb...@gmail.com>
> *Sent:* Wednesday, July 2, 2014 2:16 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: keyspace with hundreds of columnfamilies
>
> Hi,
> thank you for you replies on this; regarding the arena memory is this a
> fixed memory allocation or is some sort of in memory caching? I ask because
> I think that a substantial portion of the column families created will not
> be queried that frequently (and some will become inactive and stay like
> that really long time)
>
> Thank you,
> Tommaso
>
>
> 2014-07-02 18:35 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
>
>> Arena allocation is an improvement feature, not a limitation.
>> It was introduced in Cassandra 1.0 in order to lower memory fragmentation
>> (and therefore promotion failure).
>> AFAIK It's not intended to be tweaked so it might not be a good idea to
>> change it.
>>
>> Best,
>> Romain
>>
>> tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 17:40:18 :
>>
>> > De : tommaso barbugli <tb...@gmail.com>
>> > A : user@cassandra.apache.org,
>> > Date : 02/07/2014 17:40
>> > Objet : Re: keyspace with hundreds of columnfamilies
>> >
>> > 1MB per column family sounds pretty bad to me; is this something I
>> > can tweak/workaround somehow?
>> >
>> > Thanks
>> > Tommaso
>> >
>>
>> > 2014-07-02 17:21 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
>> > The trap is that each CF will consume 1 MB of memory due to arena
>> allocation.
>> > This might seem harmless but if you plan thousands of CF it means
>> > thousands of mega bytes...
>> > Up to 1,000 CF I think it could be doable, but not 10,000.
>> >
>> > Best,
>> >
>> > Romain
>> >
>> >
>> > tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 10:13:41
>> :
>> >
>> > > De : tommaso barbugli <tb...@gmail.com>
>> > > A : user@cassandra.apache.org,
>> > > Date : 02/07/2014 10:14
>> > > Objet : keyspace with hundreds of columnfamilies
>> > >
>> > > Hi,
>> > > Are there any known issues, shortcomings about organising data in
>> > > hundreds of column families?
>> > > At this present I am running with 300 column families but I expect
>> > > that to get to a couple of thousands.
>> > > Is this something discouraged / unsupported (I am using Cassandra
>> 2.0).
>> > >
>> > > Thanks
>> > > Tommaso
>>
>
>
Re: keyspace with hundreds of columnfamilies
Posted by Jack Krupansky <ja...@basetechnology.com>.
The official answer, engraved in stone tablets, and carried down from the mountain: “Although having more than dozens or hundreds of tables defined is almost certainly a Bad Idea (just as it is a design smell in a relational database), it's relatively straightforward to allow disabling the SlabAllocator.” Emphasis on “almost certainly a Bad Idea.”
See:
https://issues.apache.org/jira/browse/CASSANDRA-5935
“Allow disabling slab allocation”
IOW, this is considered an anti-pattern, but...
-- Jack Krupansky
From: tommaso barbugli
Sent: Wednesday, July 2, 2014 2:16 PM
To: user@cassandra.apache.org
Subject: Re: keyspace with hundreds of columnfamilies
Hi,
thank you for you replies on this; regarding the arena memory is this a fixed memory allocation or is some sort of in memory caching? I ask because I think that a substantial portion of the column families created will not be queried that frequently (and some will become inactive and stay like that really long time)
Thank you,
Tommaso
2014-07-02 18:35 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
Arena allocation is an improvement feature, not a limitation.
It was introduced in Cassandra 1.0 in order to lower memory fragmentation (and therefore promotion failure).
AFAIK It's not intended to be tweaked so it might not be a good idea to change it.
Best,
Romain
tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 17:40:18 :
> De : tommaso barbugli <tb...@gmail.com>
> A : user@cassandra.apache.org,
> Date : 02/07/2014 17:40
> Objet : Re: keyspace with hundreds of columnfamilies
>
> 1MB per column family sounds pretty bad to me; is this something I
> can tweak/workaround somehow?
>
> Thanks
> Tommaso
>
> 2014-07-02 17:21 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
> The trap is that each CF will consume 1 MB of memory due to arena allocation.
> This might seem harmless but if you plan thousands of CF it means
> thousands of mega bytes...
> Up to 1,000 CF I think it could be doable, but not 10,000.
>
> Best,
>
> Romain
>
>
> tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 10:13:41 :
>
> > De : tommaso barbugli <tb...@gmail.com>
> > A : user@cassandra.apache.org,
> > Date : 02/07/2014 10:14
> > Objet : keyspace with hundreds of columnfamilies
> >
> > Hi,
> > Are there any known issues, shortcomings about organising data in
> > hundreds of column families?
> > At this present I am running with 300 column families but I expect
> > that to get to a couple of thousands.
> > Is this something discouraged / unsupported (I am using Cassandra 2.0).
> >
> > Thanks
> > Tommaso
Re: keyspace with hundreds of columnfamilies
Posted by tommaso barbugli <tb...@gmail.com>.
Hi,
thank you for you replies on this; regarding the arena memory is this a
fixed memory allocation or is some sort of in memory caching? I ask because
I think that a substantial portion of the column families created will not
be queried that frequently (and some will become inactive and stay like
that really long time)
Thank you,
Tommaso
2014-07-02 18:35 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
> Arena allocation is an improvement feature, not a limitation.
> It was introduced in Cassandra 1.0 in order to lower memory fragmentation
> (and therefore promotion failure).
> AFAIK It's not intended to be tweaked so it might not be a good idea to
> change it.
>
> Best,
> Romain
>
> tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 17:40:18 :
>
> > De : tommaso barbugli <tb...@gmail.com>
> > A : user@cassandra.apache.org,
> > Date : 02/07/2014 17:40
> > Objet : Re: keyspace with hundreds of columnfamilies
> >
> > 1MB per column family sounds pretty bad to me; is this something I
> > can tweak/workaround somehow?
> >
> > Thanks
> > Tommaso
> >
>
> > 2014-07-02 17:21 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
> > The trap is that each CF will consume 1 MB of memory due to arena
> allocation.
> > This might seem harmless but if you plan thousands of CF it means
> > thousands of mega bytes...
> > Up to 1,000 CF I think it could be doable, but not 10,000.
> >
> > Best,
> >
> > Romain
> >
> >
> > tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 10:13:41 :
> >
> > > De : tommaso barbugli <tb...@gmail.com>
> > > A : user@cassandra.apache.org,
> > > Date : 02/07/2014 10:14
> > > Objet : keyspace with hundreds of columnfamilies
> > >
> > > Hi,
> > > Are there any known issues, shortcomings about organising data in
> > > hundreds of column families?
> > > At this present I am running with 300 column families but I expect
> > > that to get to a couple of thousands.
> > > Is this something discouraged / unsupported (I am using Cassandra
> 2.0).
> > >
> > > Thanks
> > > Tommaso
>
Re: keyspace with hundreds of columnfamilies
Posted by Romain HARDOUIN <ro...@urssaf.fr>.
Arena allocation is an improvement feature, not a limitation.
It was introduced in Cassandra 1.0 in order to lower memory fragmentation
(and therefore promotion failure).
AFAIK It's not intended to be tweaked so it might not be a good idea to
change it.
Best,
Romain
tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 17:40:18 :
> De : tommaso barbugli <tb...@gmail.com>
> A : user@cassandra.apache.org,
> Date : 02/07/2014 17:40
> Objet : Re: keyspace with hundreds of columnfamilies
>
> 1MB per column family sounds pretty bad to me; is this something I
> can tweak/workaround somehow?
>
> Thanks
> Tommaso
>
> 2014-07-02 17:21 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
> The trap is that each CF will consume 1 MB of memory due to arena
allocation.
> This might seem harmless but if you plan thousands of CF it means
> thousands of mega bytes...
> Up to 1,000 CF I think it could be doable, but not 10,000.
>
> Best,
>
> Romain
>
>
> tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 10:13:41 :
>
> > De : tommaso barbugli <tb...@gmail.com>
> > A : user@cassandra.apache.org,
> > Date : 02/07/2014 10:14
> > Objet : keyspace with hundreds of columnfamilies
> >
> > Hi,
> > Are there any known issues, shortcomings about organising data in
> > hundreds of column families?
> > At this present I am running with 300 column families but I expect
> > that to get to a couple of thousands.
> > Is this something discouraged / unsupported (I am using Cassandra
2.0).
> >
> > Thanks
> > Tommaso
Re: keyspace with hundreds of columnfamilies
Posted by tommaso barbugli <tb...@gmail.com>.
1MB per column family sounds pretty bad to me; is this something I can
tweak/workaround somehow?
Thanks
Tommaso
2014-07-02 17:21 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
> The trap is that each CF will consume 1 MB of memory due to arena
> allocation.
> This might seem harmless but if you plan thousands of CF it means
> thousands of mega bytes...
> Up to 1,000 CF I think it could be doable, but not 10,000.
>
> Best,
>
> Romain
>
>
> tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 10:13:41 :
>
> > De : tommaso barbugli <tb...@gmail.com>
> > A : user@cassandra.apache.org,
> > Date : 02/07/2014 10:14
> > Objet : keyspace with hundreds of columnfamilies
> >
> > Hi,
> > Are there any known issues, shortcomings about organising data in
> > hundreds of column families?
> > At this present I am running with 300 column families but I expect
> > that to get to a couple of thousands.
> > Is this something discouraged / unsupported (I am using Cassandra 2.0).
> >
> > Thanks
> > Tommaso
>
RE: keyspace with hundreds of columnfamilies
Posted by Romain HARDOUIN <ro...@urssaf.fr>.
The trap is that each CF will consume 1 MB of memory due to arena
allocation.
This might seem harmless but if you plan thousands of CF it means
thousands of mega bytes...
Up to 1,000 CF I think it could be doable, but not 10,000.
Best,
Romain
tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 10:13:41 :
> De : tommaso barbugli <tb...@gmail.com>
> A : user@cassandra.apache.org,
> Date : 02/07/2014 10:14
> Objet : keyspace with hundreds of columnfamilies
>
> Hi,
> Are there any known issues, shortcomings about organising data in
> hundreds of column families?
> At this present I am running with 300 column families but I expect
> that to get to a couple of thousands.
> Is this something discouraged / unsupported (I am using Cassandra 2.0).
>
> Thanks
> Tommaso