You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by tommaso barbugli <tb...@gmail.com> on 2014/07/02 10:13:41 UTC

keyspace with hundreds of columnfamilies

Hi,
Are there any known issues, shortcomings about organising data in hundreds
of column families?
At this present I am running with 300 column families but I expect that to
get to a couple of thousands.
Is this something discouraged / unsupported (I am using Cassandra 2.0).

Thanks
Tommaso

Re: keyspace with hundreds of columnfamilies

Posted by tommaso barbugli <tb...@gmail.com>.
Hi,
I am building a sort of db as a service (more, one db table as a service)
and I want every user to have their own storage as much isolated as
possible (and give them some freedom in terms of schema customisation and
the ability to build 2i indexes).
You know what kind of memory cost we are talking about? Is this going to
influence clients (I am using DS python driver) too? (eg. flooding them
with meta data)

Thanks
Tommaso


2014-07-02 13:22 GMT+02:00 Jonathan Lacefield <jl...@datastax.com>:

> Hello
>   There is overhead for memory with each col family.  This type of
> configuration could cause heap issues.  What is driving the
> requirement for so many Cfs?
>
> > On Jul 2, 2014, at 4:14 AM, tommaso barbugli <tb...@gmail.com>
> wrote:
> >
> > Hi,
> > Are there any known issues, shortcomings about organising data in
> hundreds of column families?
> > At this present I am running with 300 column families but I expect that
> to get to a couple of thousands.
> > Is this something discouraged / unsupported (I am using Cassandra 2.0).
> >
> > Thanks
> > Tommaso
>

Re: keyspace with hundreds of columnfamilies

Posted by Jonathan Lacefield <jl...@datastax.com>.
Hello
  There is overhead for memory with each col family.  This type of
configuration could cause heap issues.  What is driving the
requirement for so many Cfs?

> On Jul 2, 2014, at 4:14 AM, tommaso barbugli <tb...@gmail.com> wrote:
>
> Hi,
> Are there any known issues, shortcomings about organising data in hundreds of column families?
> At this present I am running with 300 column families but I expect that to get to a couple of thousands.
> Is this something discouraged / unsupported (I am using Cassandra 2.0).
>
> Thanks
> Tommaso

Re: keyspace with hundreds of columnfamilies

Posted by Jack Krupansky <ja...@basetechnology.com>.
If your 1K tables might grow to  5 or 10K, then doesn’t that mean you would be trying to add columns, later, after you’ve populated your data? If so, that would argue for using one or more map columns, to accommodate the dynamic addition of pseudo-columns.

Once again, look at your queries (as they would be today and as in the future as you expand the data) since they will be your ultimate guide as to how to model your data.

And drill deeper into how you will be inserting and updating the data in “groups” – that will guide the data modeling as well. What will the typical update use cases look like?

By all means, start simple, but also be careful not to paint yourself into a corner. In the alternative, be prepared to throw away entire implementations as your conceptualization of the data evolves.

-- Jack Krupansky

From: tommaso barbugli 
Sent: Saturday, July 12, 2014 3:12 PM
To: user@cassandra.apache.org 
Subject: Re: keyspace with hundreds of columnfamilies

hi Jack 
thank you for your clear answer!

On Saturday, 12 July 2014, Jack Krupansky <ja...@basetechnology.com> wrote:

  1. What does your data look like – 100 small integers or short strings and dates, or... 100 massive blobs?

it will be only small short strings/varints no blobs or nested data



  2. What operations are you doing on those rows – reading and updating individual columns, or mostly full-row upserts?

mostly read write grops of columns (previously i had those set of columns in different CFs) 

  3. 100 columns in a CQL row is not so unreasonable, per se.

  4. The ultimate answer to any “how will it perform” question is to do a “proof of concept” implementation since it really all depends on your actual data and hardware setup, such as memory, cpu, I/O, and networking – IOW, all the non-Cassandra factors can easily dwarf Cassandra itself.

  5. As far as 1K tables with 10 columns vs. 100 tables with 100 columns – it should primarily be your queries (and updates) that drive the decision. Do fewer tables and more columns make your queries (and updates) a lot simpler and cleaner?

yes code-wise it does; i am just scared that i will get into some bad situation problem when 1k CFs will grow to 5 or 10k


  -- Jack Krupansky

  From: tommaso barbugli 
  Sent: Saturday, July 12, 2014 7:58 AM
  To: javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org'); 
  Subject: Re: keyspace with hundreds of columnfamilies

  hi, 
  how is a table with hundreds columns is going to perform? 

  i am moving from 1k column families each with 10 columns to 100 CFs each with 100 columns.

  thank you
  tommaso

  On Friday, 11 July 2014, Sourabh Agrawal <javascript:_e(%7B%7D,'cvml','iitr.sourabh@gmail.com');> wrote:

    Yes, what about CQL style columns? Please clarify



    On Sat, Jul 5, 2014 at 12:32 PM, tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');> wrote:

      Yes my question what about CQL-style columns.



      2014-07-04 12:40 GMT+02:00 Jens Rantil <javascript:_e(%7B%7D,'cvml','jens.rantil@tink.se');>: 


        Just so you guys aren't misunderstanding each other; Tommaso, you were not refering to CQL-style columns, right? 

        /J



        On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <javascript:_e(%7B%7D,'cvml','romain.hardouin@urssaf.fr');> wrote:

          Cassandra can handle many more columns (e.g. time series). 
          So 100 columns is OK. 

          Best, 
          Romain 



          tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');> a écrit sur 03/07/2014 21:55:18 :

          > De : tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');> 
          > A : javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');, 
          > Date : 03/07/2014 21:55 

          > Objet : Re: keyspace with hundreds of columnfamilies 

          > 

          > thank you for the replies; I am rethinking the schema design, one 
          > possible solution is to "implode" one dimension and get N times less CFs.


          > With this approach I would come up with (cql) tables with up to 100 
          > columns; would that be a problem? 
          > 
          > Thank You, 
          > Tommaso 
          > 







    -- 

    Sourabh Agrawal 
    Bangalore
    +91 9945657973


  -- 
  sent from iphone (sorry for the typos)



-- 
sent from iphone (sorry for the typos)

Re: keyspace with hundreds of columnfamilies

Posted by tommaso barbugli <tb...@gmail.com>.
hi Jack
thank you for your clear answer!

On Saturday, 12 July 2014, Jack Krupansky <ja...@basetechnology.com> wrote:

>   1. What does your data look like – 100 small integers or short strings
> and dates, or... 100 massive blobs?
>

it will be only small short strings/varints no blobs or nested data


> 2. What operations are you doing on those rows – reading and updating
> individual columns, or mostly full-row upserts?
>

mostly read write grops of columns (previously i had those set of columns
in different CFs)

>
> 3. 100 columns in a CQL row is not so unreasonable, per se.
>
> 4. The ultimate answer to any “how will it perform” question is to do a
> “proof of concept” implementation since it really all depends on your
> actual data and hardware setup, such as memory, cpu, I/O, and networking –
> IOW, all the non-Cassandra factors can easily dwarf Cassandra itself.
>
> 5. As far as 1K tables with 10 columns vs. 100 tables with 100 columns –
> it should primarily be your queries (and updates) that drive the decision.
> Do fewer tables and more columns make your queries (and updates) a lot
> simpler and cleaner?
>

yes code-wise it does; i am just scared that i will get into some bad
situation problem when 1k CFs will grow to 5 or 10k


>
> -- Jack Krupansky
>
>  *From:* tommaso barbugli
> <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>
> *Sent:* Saturday, July 12, 2014 7:58 AM
> *To:* user@cassandra.apache.org
> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>
> *Subject:* Re: keyspace with hundreds of columnfamilies
>
> hi,
> how is a table with hundreds columns is going to perform?
>
> i am moving from 1k column families each with 10 columns to 100 CFs each
> with 100 columns.
>
> thank you
> tommaso
>
> On Friday, 11 July 2014, Sourabh Agrawal <iitr.sourabh@gmail.com
> <javascript:_e(%7B%7D,'cvml','iitr.sourabh@gmail.com');>> wrote:
>
>> Yes, what about CQL style columns? Please clarify
>>
>>
>> On Sat, Jul 5, 2014 at 12:32 PM, tommaso barbugli <
>> javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');> wrote:
>>
>>>  Yes my question what about CQL-style columns.
>>>
>>>
>>> 2014-07-04 12:40 GMT+02:00 Jens Rantil <
>>> javascript:_e(%7B%7D,'cvml','jens.rantil@tink.se');>:
>>>
>>>  Just so you guys aren't misunderstanding each other; Tommaso, you were
>>>> not refering to CQL-style columns, right?
>>>>
>>>> /J
>>>>
>>>>
>>>> On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <
>>>> javascript:_e(%7B%7D,'cvml','romain.hardouin@urssaf.fr');> wrote:
>>>>
>>>>> Cassandra can handle many more columns (e.g. time series).
>>>>> So 100 columns is OK.
>>>>>
>>>>> Best,
>>>>> Romain
>>>>>
>>>>>
>>>>>
>>>>> tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>
>>>>> a écrit sur 03/07/2014 21:55:18 :
>>>>>
>>>>> > De : tommaso barbugli <
>>>>> javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>
>>>>> > A : javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');,
>>>>> > Date : 03/07/2014 21:55
>>>>> > Objet : Re: keyspace with hundreds of columnfamilies
>>>>> >
>>>>> > thank you for the replies; I am rethinking the schema design, one
>>>>> > possible solution is to "implode" one dimension and get N times less
>>>>> CFs.
>>>>>
>>>>>  > With this approach I would come up with (cql) tables with up to
>>>>> 100
>>>>> > columns; would that be a problem?
>>>>> >
>>>>> > Thank You,
>>>>> > Tommaso
>>>>> >
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>>
>> --
>> Sourabh Agrawal
>> Bangalore
>> +91 9945657973
>>
>
>
> --
> sent from iphone (sorry for the typos)
>


-- 
sent from iphone (sorry for the typos)

Re: keyspace with hundreds of columnfamilies

Posted by Jack Krupansky <ja...@basetechnology.com>.
1. What does your data look like – 100 small integers or short strings and dates, or... 100 massive blobs?

2. What operations are you doing on those rows – reading and updating individual columns, or mostly full-row upserts?

3. 100 columns in a CQL row is not so unreasonable, per se.

4. The ultimate answer to any “how will it perform” question is to do a “proof of concept” implementation since it really all depends on your actual data and hardware setup, such as memory, cpu, I/O, and networking – IOW, all the non-Cassandra factors can easily dwarf Cassandra itself.

5. As far as 1K tables with 10 columns vs. 100 tables with 100 columns – it should primarily be your queries (and updates) that drive the decision. Do fewer tables and more columns make your queries (and updates) a lot simpler and cleaner?

-- Jack Krupansky

From: tommaso barbugli 
Sent: Saturday, July 12, 2014 7:58 AM
To: user@cassandra.apache.org 
Subject: Re: keyspace with hundreds of columnfamilies

hi, 
how is a table with hundreds columns is going to perform? 

i am moving from 1k column families each with 10 columns to 100 CFs each with 100 columns.

thank you
tommaso

On Friday, 11 July 2014, Sourabh Agrawal <ii...@gmail.com> wrote:

  Yes, what about CQL style columns? Please clarify



  On Sat, Jul 5, 2014 at 12:32 PM, tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');> wrote:

    Yes my question what about CQL-style columns.



    2014-07-04 12:40 GMT+02:00 Jens Rantil <javascript:_e(%7B%7D,'cvml','jens.rantil@tink.se');>: 


      Just so you guys aren't misunderstanding each other; Tommaso, you were not refering to CQL-style columns, right? 

      /J



      On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <javascript:_e(%7B%7D,'cvml','romain.hardouin@urssaf.fr');> wrote:

        Cassandra can handle many more columns (e.g. time series). 
        So 100 columns is OK. 

        Best, 
        Romain 



        tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');> a écrit sur 03/07/2014 21:55:18 :

        > De : tommaso barbugli <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');> 
        > A : javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');, 
        > Date : 03/07/2014 21:55 

        > Objet : Re: keyspace with hundreds of columnfamilies 

        > 

        > thank you for the replies; I am rethinking the schema design, one 
        > possible solution is to "implode" one dimension and get N times less CFs.


        > With this approach I would come up with (cql) tables with up to 100 
        > columns; would that be a problem? 
        > 
        > Thank You, 
        > Tommaso 
        > 







  -- 

  Sourabh Agrawal 
  Bangalore
  +91 9945657973


-- 
sent from iphone (sorry for the typos)

Re: keyspace with hundreds of columnfamilies

Posted by tommaso barbugli <tb...@gmail.com>.
hi,
how is a table with hundreds columns is going to perform?

i am moving from 1k column families each with 10 columns to 100 CFs each
with 100 columns.

thank you
tommaso

On Friday, 11 July 2014, Sourabh Agrawal <ii...@gmail.com> wrote:

> Yes, what about CQL style columns? Please clarify
>
>
> On Sat, Jul 5, 2014 at 12:32 PM, tommaso barbugli <tbarbugli@gmail.com
> <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>> wrote:
>
>> Yes my question what about CQL-style columns.
>>
>>
>> 2014-07-04 12:40 GMT+02:00 Jens Rantil <jens.rantil@tink.se
>> <javascript:_e(%7B%7D,'cvml','jens.rantil@tink.se');>>:
>>
>> Just so you guys aren't misunderstanding each other; Tommaso, you were
>>> not refering to CQL-style columns, right?
>>>
>>> /J
>>>
>>>
>>> On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <
>>> romain.hardouin@urssaf.fr
>>> <javascript:_e(%7B%7D,'cvml','romain.hardouin@urssaf.fr');>> wrote:
>>>
>>>> Cassandra can handle many more columns (e.g. time series).
>>>> So 100 columns is OK.
>>>>
>>>> Best,
>>>> Romain
>>>>
>>>>
>>>>
>>>> tommaso barbugli <tbarbugli@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>> a écrit sur
>>>> 03/07/2014 21:55:18 :
>>>>
>>>> > De : tommaso barbugli <tbarbugli@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','tbarbugli@gmail.com');>>
>>>> > A : user@cassandra.apache.org
>>>> <javascript:_e(%7B%7D,'cvml','user@cassandra.apache.org');>,
>>>> > Date : 03/07/2014 21:55
>>>> > Objet : Re: keyspace with hundreds of columnfamilies
>>>> >
>>>> > thank you for the replies; I am rethinking the schema design, one
>>>> > possible solution is to "implode" one dimension and get N times less
>>>> CFs.
>>>>
>>>> > With this approach I would come up with (cql) tables with up to 100
>>>> > columns; would that be a problem?
>>>> >
>>>> > Thank You,
>>>> > Tommaso
>>>> >
>>>>
>>>
>>>
>>
>
>
> --
> Sourabh Agrawal
> Bangalore
> +91 9945657973
>


-- 
sent from iphone (sorry for the typos)

Re: keyspace with hundreds of columnfamilies

Posted by Sourabh Agrawal <ii...@gmail.com>.
Yes, what about CQL style columns? Please clarify


On Sat, Jul 5, 2014 at 12:32 PM, tommaso barbugli <tb...@gmail.com>
wrote:

> Yes my question what about CQL-style columns.
>
>
> 2014-07-04 12:40 GMT+02:00 Jens Rantil <je...@tink.se>:
>
> Just so you guys aren't misunderstanding each other; Tommaso, you were not
>> refering to CQL-style columns, right?
>>
>> /J
>>
>>
>> On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <
>> romain.hardouin@urssaf.fr> wrote:
>>
>>> Cassandra can handle many more columns (e.g. time series).
>>> So 100 columns is OK.
>>>
>>> Best,
>>> Romain
>>>
>>>
>>>
>>> tommaso barbugli <tb...@gmail.com> a écrit sur 03/07/2014 21:55:18 :
>>>
>>> > De : tommaso barbugli <tb...@gmail.com>
>>> > A : user@cassandra.apache.org,
>>> > Date : 03/07/2014 21:55
>>> > Objet : Re: keyspace with hundreds of columnfamilies
>>> >
>>> > thank you for the replies; I am rethinking the schema design, one
>>> > possible solution is to "implode" one dimension and get N times less
>>> CFs.
>>>
>>> > With this approach I would come up with (cql) tables with up to 100
>>> > columns; would that be a problem?
>>> >
>>> > Thank You,
>>> > Tommaso
>>> >
>>>
>>
>>
>


-- 
Sourabh Agrawal
Bangalore
+91 9945657973

Re: keyspace with hundreds of columnfamilies

Posted by tommaso barbugli <tb...@gmail.com>.
Yes my question what about CQL-style columns.


2014-07-04 12:40 GMT+02:00 Jens Rantil <je...@tink.se>:

> Just so you guys aren't misunderstanding each other; Tommaso, you were not
> refering to CQL-style columns, right?
>
> /J
>
>
> On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <
> romain.hardouin@urssaf.fr> wrote:
>
>> Cassandra can handle many more columns (e.g. time series).
>> So 100 columns is OK.
>>
>> Best,
>> Romain
>>
>>
>>
>> tommaso barbugli <tb...@gmail.com> a écrit sur 03/07/2014 21:55:18 :
>>
>> > De : tommaso barbugli <tb...@gmail.com>
>> > A : user@cassandra.apache.org,
>> > Date : 03/07/2014 21:55
>> > Objet : Re: keyspace with hundreds of columnfamilies
>> >
>> > thank you for the replies; I am rethinking the schema design, one
>> > possible solution is to "implode" one dimension and get N times less
>> CFs.
>>
>> > With this approach I would come up with (cql) tables with up to 100
>> > columns; would that be a problem?
>> >
>> > Thank You,
>> > Tommaso
>> >
>>
>
>

Re: keyspace with hundreds of columnfamilies

Posted by Jens Rantil <je...@tink.se>.
Just so you guys aren't misunderstanding each other; Tommaso, you were not
refering to CQL-style columns, right?

/J


On Fri, Jul 4, 2014 at 10:18 AM, Romain HARDOUIN <ro...@urssaf.fr>
wrote:

> Cassandra can handle many more columns (e.g. time series).
> So 100 columns is OK.
>
> Best,
> Romain
>
>
>
> tommaso barbugli <tb...@gmail.com> a écrit sur 03/07/2014 21:55:18 :
>
> > De : tommaso barbugli <tb...@gmail.com>
> > A : user@cassandra.apache.org,
> > Date : 03/07/2014 21:55
> > Objet : Re: keyspace with hundreds of columnfamilies
> >
> > thank you for the replies; I am rethinking the schema design, one
> > possible solution is to "implode" one dimension and get N times less CFs.
>
> > With this approach I would come up with (cql) tables with up to 100
> > columns; would that be a problem?
> >
> > Thank You,
> > Tommaso
> >
>

Re: keyspace with hundreds of columnfamilies

Posted by Romain HARDOUIN <ro...@urssaf.fr>.
Cassandra can handle many more columns (e.g. time series).
So 100 columns is OK.

Best,
Romain



tommaso barbugli <tb...@gmail.com> a écrit sur 03/07/2014 21:55:18 :

> De : tommaso barbugli <tb...@gmail.com>
> A : user@cassandra.apache.org, 
> Date : 03/07/2014 21:55
> Objet : Re: keyspace with hundreds of columnfamilies
> 
> thank you for the replies; I am rethinking the schema design, one 
> possible solution is to "implode" one dimension and get N times less 
CFs.
> With this approach I would come up with (cql) tables with up to 100 
> columns; would that be a problem?
> 
> Thank You,
> Tommaso
> 

Re: keyspace with hundreds of columnfamilies

Posted by Ilya Sviridov <is...@mirantis.com>.
Tommaso, looking at your description of the architecture the idea came up.

You can perform sharding on cassandra client and write to different
cassandra clusters to keep the number of column families reasonable.

With best regards,
Ilya


On Thu, Jul 3, 2014 at 10:55 PM, tommaso barbugli <tb...@gmail.com>
wrote:

> thank you for the replies; I am rethinking the schema design, one possible
> solution is to "implode" one dimension and get N times less CFs.
> With this approach I would come up with (cql) tables with up to 100
> columns; would that be a problem?
>
> Thank You,
> Tommaso
>
>
> 2014-07-02 23:43 GMT+02:00 Jack Krupansky <ja...@basetechnology.com>:
>
>   The official answer, engraved in stone tablets, and carried down from
>> the mountain: “Although having more than dozens or hundreds of tables
>> defined is almost certainly a Bad Idea (just as it is a design smell in a
>> relational database), it's relatively straightforward to allow disabling
>> the SlabAllocator.” Emphasis on “almost certainly a Bad Idea.”
>>
>> See:
>> https://issues.apache.org/jira/browse/CASSANDRA-5935
>> “Allow disabling slab allocation”
>>
>> IOW, this is considered an anti-pattern, but...
>>
>> -- Jack Krupansky
>>
>>  *From:* tommaso barbugli <tb...@gmail.com>
>> *Sent:* Wednesday, July 2, 2014 2:16 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: keyspace with hundreds of columnfamilies
>>
>>  Hi,
>> thank you for you replies on this; regarding the arena memory is this a
>> fixed memory allocation or is some sort of in memory caching? I ask because
>> I think that a substantial portion of the column families created will not
>> be queried that frequently (and some will become inactive and stay like
>> that really long time)
>>
>> Thank you,
>> Tommaso
>>
>>
>> 2014-07-02 18:35 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
>>
>>> Arena allocation is an improvement feature, not a limitation.
>>> It was introduced in Cassandra 1.0 in order to lower memory
>>> fragmentation (and therefore promotion failure).
>>> AFAIK It's not intended to be tweaked so it might not be a good idea to
>>> change it.
>>>
>>> Best,
>>> Romain
>>>
>>> tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 17:40:18 :
>>>
>>> > De : tommaso barbugli <tb...@gmail.com>
>>> > A : user@cassandra.apache.org,
>>> > Date : 02/07/2014 17:40
>>> > Objet : Re: keyspace with hundreds of columnfamilies
>>>  >
>>> > 1MB per column family sounds pretty bad to me; is this something I
>>> > can tweak/workaround somehow?
>>> >
>>> > Thanks
>>> > Tommaso
>>> >
>>>
>>> > 2014-07-02 17:21 GMT+02:00 Romain HARDOUIN <romain.hardouin@urssaf.fr
>>> >:
>>> > The trap is that each CF will consume 1 MB of memory due to arena
>>> allocation.
>>> > This might seem harmless but if you plan thousands of CF it means
>>> > thousands of mega bytes...
>>> > Up to 1,000 CF I think it could be doable, but not 10,000.
>>> >
>>> > Best,
>>> >
>>> > Romain
>>> >
>>> >
>>> > tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014
>>> 10:13:41 :
>>> >
>>> > > De : tommaso barbugli <tb...@gmail.com>
>>> > > A : user@cassandra.apache.org,
>>> > > Date : 02/07/2014 10:14
>>> > > Objet : keyspace with hundreds of columnfamilies
>>> > >
>>> > > Hi,
>>> > > Are there any known issues, shortcomings about organising data in
>>> > > hundreds of column families?
>>> > > At this present I am running with 300 column families but I expect
>>> > > that to get to a couple of thousands.
>>> > > Is this something discouraged / unsupported (I am using Cassandra
>>> 2.0).
>>> > >
>>> > > Thanks
>>> > > Tommaso
>>>
>>
>>
>
>

Re: keyspace with hundreds of columnfamilies

Posted by tommaso barbugli <tb...@gmail.com>.
thank you for the replies; I am rethinking the schema design, one possible
solution is to "implode" one dimension and get N times less CFs.
With this approach I would come up with (cql) tables with up to 100
columns; would that be a problem?

Thank You,
Tommaso


2014-07-02 23:43 GMT+02:00 Jack Krupansky <ja...@basetechnology.com>:

>   The official answer, engraved in stone tablets, and carried down from
> the mountain: “Although having more than dozens or hundreds of tables
> defined is almost certainly a Bad Idea (just as it is a design smell in a
> relational database), it's relatively straightforward to allow disabling
> the SlabAllocator.” Emphasis on “almost certainly a Bad Idea.”
>
> See:
> https://issues.apache.org/jira/browse/CASSANDRA-5935
> “Allow disabling slab allocation”
>
> IOW, this is considered an anti-pattern, but...
>
> -- Jack Krupansky
>
>  *From:* tommaso barbugli <tb...@gmail.com>
> *Sent:* Wednesday, July 2, 2014 2:16 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: keyspace with hundreds of columnfamilies
>
>  Hi,
> thank you for you replies on this; regarding the arena memory is this a
> fixed memory allocation or is some sort of in memory caching? I ask because
> I think that a substantial portion of the column families created will not
> be queried that frequently (and some will become inactive and stay like
> that really long time)
>
> Thank you,
> Tommaso
>
>
> 2014-07-02 18:35 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
>
>> Arena allocation is an improvement feature, not a limitation.
>> It was introduced in Cassandra 1.0 in order to lower memory fragmentation
>> (and therefore promotion failure).
>> AFAIK It's not intended to be tweaked so it might not be a good idea to
>> change it.
>>
>> Best,
>> Romain
>>
>> tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 17:40:18 :
>>
>> > De : tommaso barbugli <tb...@gmail.com>
>> > A : user@cassandra.apache.org,
>> > Date : 02/07/2014 17:40
>> > Objet : Re: keyspace with hundreds of columnfamilies
>>  >
>> > 1MB per column family sounds pretty bad to me; is this something I
>> > can tweak/workaround somehow?
>> >
>> > Thanks
>> > Tommaso
>> >
>>
>> > 2014-07-02 17:21 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
>> > The trap is that each CF will consume 1 MB of memory due to arena
>> allocation.
>> > This might seem harmless but if you plan thousands of CF it means
>> > thousands of mega bytes...
>> > Up to 1,000 CF I think it could be doable, but not 10,000.
>> >
>> > Best,
>> >
>> > Romain
>> >
>> >
>> > tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 10:13:41
>> :
>> >
>> > > De : tommaso barbugli <tb...@gmail.com>
>> > > A : user@cassandra.apache.org,
>> > > Date : 02/07/2014 10:14
>> > > Objet : keyspace with hundreds of columnfamilies
>> > >
>> > > Hi,
>> > > Are there any known issues, shortcomings about organising data in
>> > > hundreds of column families?
>> > > At this present I am running with 300 column families but I expect
>> > > that to get to a couple of thousands.
>> > > Is this something discouraged / unsupported (I am using Cassandra
>> 2.0).
>> > >
>> > > Thanks
>> > > Tommaso
>>
>
>

Re: keyspace with hundreds of columnfamilies

Posted by Jack Krupansky <ja...@basetechnology.com>.
The official answer, engraved in stone tablets, and carried down from the mountain: “Although having more than dozens or hundreds of tables defined is almost certainly a Bad Idea (just as it is a design smell in a relational database), it's relatively straightforward to allow disabling the SlabAllocator.” Emphasis on “almost certainly a Bad Idea.”

See:
https://issues.apache.org/jira/browse/CASSANDRA-5935
“Allow disabling slab allocation”

IOW, this is considered an anti-pattern, but...

-- Jack Krupansky

From: tommaso barbugli 
Sent: Wednesday, July 2, 2014 2:16 PM
To: user@cassandra.apache.org 
Subject: Re: keyspace with hundreds of columnfamilies

Hi, 
thank you for you replies on this; regarding the arena memory is this a fixed memory allocation or is some sort of in memory caching? I ask because I think that a substantial portion of the column families created will not be queried that frequently (and some will become inactive and stay like that really long time)

Thank you,
Tommaso



2014-07-02 18:35 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:

  Arena allocation is an improvement feature, not a limitation. 
  It was introduced in Cassandra 1.0 in order to lower memory fragmentation (and therefore promotion failure). 
  AFAIK It's not intended to be tweaked so it might not be a good idea to change it. 

  Best, 
  Romain 

  tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 17:40:18 :

  > De : tommaso barbugli <tb...@gmail.com> 
  > A : user@cassandra.apache.org, 
  > Date : 02/07/2014 17:40 
  > Objet : Re: keyspace with hundreds of columnfamilies 

  > 
  > 1MB per column family sounds pretty bad to me; is this something I 
  > can tweak/workaround somehow? 
  > 
  > Thanks 
  > Tommaso 
  > 

  > 2014-07-02 17:21 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>: 
  > The trap is that each CF will consume 1 MB of memory due to arena allocation. 
  > This might seem harmless but if you plan thousands of CF it means 
  > thousands of mega bytes... 
  > Up to 1,000 CF I think it could be doable, but not 10,000. 
  > 
  > Best, 
  > 
  > Romain 
  > 
  > 
  > tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 10:13:41 :
  > 
  > > De : tommaso barbugli <tb...@gmail.com> 
  > > A : user@cassandra.apache.org, 
  > > Date : 02/07/2014 10:14 
  > > Objet : keyspace with hundreds of columnfamilies 
  > > 
  > > Hi, 
  > > Are there any known issues, shortcomings about organising data in 
  > > hundreds of column families? 
  > > At this present I am running with 300 column families but I expect 
  > > that to get to a couple of thousands. 
  > > Is this something discouraged / unsupported (I am using Cassandra 2.0). 
  > > 
  > > Thanks 
  > > Tommaso

Re: keyspace with hundreds of columnfamilies

Posted by tommaso barbugli <tb...@gmail.com>.
Hi,
thank you for you replies on this; regarding the arena memory is this a
fixed memory allocation or is some sort of in memory caching? I ask because
I think that a substantial portion of the column families created will not
be queried that frequently (and some will become inactive and stay like
that really long time)

Thank you,
Tommaso


2014-07-02 18:35 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:

> Arena allocation is an improvement feature, not a limitation.
> It was introduced in Cassandra 1.0 in order to lower memory fragmentation
> (and therefore promotion failure).
> AFAIK It's not intended to be tweaked so it might not be a good idea to
> change it.
>
> Best,
> Romain
>
> tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 17:40:18 :
>
> > De : tommaso barbugli <tb...@gmail.com>
> > A : user@cassandra.apache.org,
> > Date : 02/07/2014 17:40
> > Objet : Re: keyspace with hundreds of columnfamilies
> >
> > 1MB per column family sounds pretty bad to me; is this something I
> > can tweak/workaround somehow?
> >
> > Thanks
> > Tommaso
> >
>
> > 2014-07-02 17:21 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
> > The trap is that each CF will consume 1 MB of memory due to arena
> allocation.
> > This might seem harmless but if you plan thousands of CF it means
> > thousands of mega bytes...
> > Up to 1,000 CF I think it could be doable, but not 10,000.
> >
> > Best,
> >
> > Romain
> >
> >
> > tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 10:13:41 :
> >
> > > De : tommaso barbugli <tb...@gmail.com>
> > > A : user@cassandra.apache.org,
> > > Date : 02/07/2014 10:14
> > > Objet : keyspace with hundreds of columnfamilies
> > >
> > > Hi,
> > > Are there any known issues, shortcomings about organising data in
> > > hundreds of column families?
> > > At this present I am running with 300 column families but I expect
> > > that to get to a couple of thousands.
> > > Is this something discouraged / unsupported (I am using Cassandra
> 2.0).
> > >
> > > Thanks
> > > Tommaso
>

Re: keyspace with hundreds of columnfamilies

Posted by Romain HARDOUIN <ro...@urssaf.fr>.
Arena allocation is an improvement feature, not a limitation. 
It was introduced in Cassandra 1.0 in order to lower memory fragmentation 
(and therefore promotion failure).
AFAIK It's not intended to be tweaked so it might not be a good idea to 
change it.

Best,
Romain

tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 17:40:18 :

> De : tommaso barbugli <tb...@gmail.com>
> A : user@cassandra.apache.org, 
> Date : 02/07/2014 17:40
> Objet : Re: keyspace with hundreds of columnfamilies
> 
> 1MB per column family sounds pretty bad to me; is this something I 
> can tweak/workaround somehow?
> 
> Thanks
> Tommaso
> 

> 2014-07-02 17:21 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:
> The trap is that each CF will consume 1 MB of memory due to arena 
allocation. 
> This might seem harmless but if you plan thousands of CF it means 
> thousands of mega bytes... 
> Up to 1,000 CF I think it could be doable, but not 10,000. 
> 
> Best, 
> 
> Romain 
> 
> 
> tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 10:13:41 :
> 
> > De : tommaso barbugli <tb...@gmail.com> 
> > A : user@cassandra.apache.org, 
> > Date : 02/07/2014 10:14 
> > Objet : keyspace with hundreds of columnfamilies 
> > 
> > Hi, 
> > Are there any known issues, shortcomings about organising data in 
> > hundreds of column families? 
> > At this present I am running with 300 column families but I expect 
> > that to get to a couple of thousands. 
> > Is this something discouraged / unsupported (I am using Cassandra 
2.0). 
> > 
> > Thanks 
> > Tommaso

Re: keyspace with hundreds of columnfamilies

Posted by tommaso barbugli <tb...@gmail.com>.
1MB per column family sounds pretty bad to me; is this something I can
tweak/workaround somehow?

Thanks
Tommaso


2014-07-02 17:21 GMT+02:00 Romain HARDOUIN <ro...@urssaf.fr>:

> The trap is that each CF will consume 1 MB of memory due to arena
> allocation.
> This might seem harmless but if you plan thousands of CF it means
> thousands of mega bytes...
> Up to 1,000 CF I think it could be doable, but not 10,000.
>
> Best,
>
> Romain
>
>
> tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 10:13:41 :
>
> > De : tommaso barbugli <tb...@gmail.com>
> > A : user@cassandra.apache.org,
> > Date : 02/07/2014 10:14
> > Objet : keyspace with hundreds of columnfamilies
> >
> > Hi,
> > Are there any known issues, shortcomings about organising data in
> > hundreds of column families?
> > At this present I am running with 300 column families but I expect
> > that to get to a couple of thousands.
> > Is this something discouraged / unsupported (I am using Cassandra 2.0).
> >
> > Thanks
> > Tommaso
>

RE: keyspace with hundreds of columnfamilies

Posted by Romain HARDOUIN <ro...@urssaf.fr>.
The trap is that each CF will consume 1 MB of memory due to arena 
allocation. 
This might seem harmless but if you plan thousands of CF it means 
thousands of mega bytes...
Up to 1,000 CF I think it could be doable, but not 10,000.

Best,

Romain


tommaso barbugli <tb...@gmail.com> a écrit sur 02/07/2014 10:13:41 :

> De : tommaso barbugli <tb...@gmail.com>
> A : user@cassandra.apache.org, 
> Date : 02/07/2014 10:14
> Objet : keyspace with hundreds of columnfamilies
> 
> Hi,
> Are there any known issues, shortcomings about organising data in 
> hundreds of column families?
> At this present I am running with 300 column families but I expect 
> that to get to a couple of thousands.
> Is this something discouraged / unsupported (I am using Cassandra 2.0).
> 
> Thanks
> Tommaso