You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Rafael Almeida <al...@yahoo.com> on 2011/12/21 14:45:13 UTC

Creating column families per client

Hello,

I am evaluating the usage of cassandra for my system. I will have several clients who won't share data with each other. My idea is to create one column family per client. When a new client comes in and adds data to the system, I'd like to create a column family dynamically. Is that reliable? Can I create a column family on a node and imediately add new data on that column family and be confident that the data added will eventually become visible to a read?

[]'s
Rafael



Re: Creating column families per client

Posted by Philippe <wa...@gmail.com>.
Every node?  I hadn't realized that. Is there a place where i can compute
how much memory is being 'wasted' ?
Le 21 déc. 2011 15:09, "Alain RODRIGUEZ" <ar...@gmail.com> a écrit :

> Hi, I don't know if this will be technically possible, but I just want to
> warn you about creating a lot of column families. When you will have a lot
> clients, you will have a lot of column families and, if I'm right, each
> column family uses memory on every node. You will run out of memory very
> fast, because of the non horizontal growth of the memory usage per column
> family (adding nodes won't resolve the problem, you will have to upgrade
> hardware on existing nodes, which is not Cassandra's philosophy).
>
> I think you should use rows or columns inside your column families as much
> as possible unless you have really a few clients.
>
> Once again, I'm not a Cassandra expert. I might be wrong :-).
>
> Alain
>
> 2011/12/21 Rafael Almeida <al...@yahoo.com>
>
>> Hello,
>>
>> I am evaluating the usage of cassandra for my system. I will have several
>> clients who won't share data with each other. My idea is to create one
>> column family per client. When a new client comes in and adds data to the
>> system, I'd like to create a column family dynamically. Is that reliable?
>> Can I create a column family on a node and imediately add new data on that
>> column family and be confident that the data added will eventually become
>> visible to a read?
>>
>> []'s
>> Rafael
>>
>>
>>
>

Re: Creating column families per client

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hi, I don't know if this will be technically possible, but I just want to
warn you about creating a lot of column families. When you will have a lot
clients, you will have a lot of column families and, if I'm right, each
column family uses memory on every node. You will run out of memory very
fast, because of the non horizontal growth of the memory usage per column
family (adding nodes won't resolve the problem, you will have to upgrade
hardware on existing nodes, which is not Cassandra's philosophy).

I think you should use rows or columns inside your column families as much
as possible unless you have really a few clients.

Once again, I'm not a Cassandra expert. I might be wrong :-).

Alain

2011/12/21 Rafael Almeida <al...@yahoo.com>

> Hello,
>
> I am evaluating the usage of cassandra for my system. I will have several
> clients who won't share data with each other. My idea is to create one
> column family per client. When a new client comes in and adds data to the
> system, I'd like to create a column family dynamically. Is that reliable?
> Can I create a column family on a node and imediately add new data on that
> column family and be confident that the data added will eventually become
> visible to a read?
>
> []'s
> Rafael
>
>
>

Re: Creating column families per client

Posted by Nick Bailey <ni...@datastax.com>.
The overhead for column families was greatly reduced in 0.8 and 1.0.
It should now be possible to have hundreds or thousands of column
families. The setting 'memtable_total_space_in_mb' was introduced that
allows for a global memtable threshold, and cassandra will handle
flushing on its own.

See http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management

Another thing you should consider is the lack of built in access
controls. There is an authentication/authorization interface you can
plug in to and examples in the examples/ directory of the source
download.

On Wed, Dec 21, 2011 at 10:36 AM, Ryan Lowe <ry...@gmail.com> wrote:
> What we have done to avoid creating multiple column families is to sort of
> namespace the row key.  So if we have a column family of Users and accounts:
> "AccountA" and "AccountB", we do the following:
>
> Column Family User:
>    "AccountA/ryan" : { first: Ryan, last: Lowe }
>    "AccountB/ryan" : { first: Ryan, last: Smith}
>
> etc.
>
> For our needs, this did the same thing as having 2 "User" column families
> for "AccountA" and "AccountB"
>
> Ryan
>
>
> On Wed, Dec 21, 2011 at 10:34 AM, Flavio Baronti <f....@list-group.com>
> wrote:
>>
>> Hi,
>>
>> based on my experience with Cassandra 0.7.4, i strongly discourage you to
>> do that: we tried dynamical creation of column families, and it was a
>> nightmare.
>> First of all, the operation can not be done concurrently, therefore you
>> must find a way to avoid parallel creation (over all the cluster, not in a
>> single node).
>> The main problem however is with timestamps. The structure of your
>> keyspace is versioned with a time-dependent id, which is assigned by the
>> host where you perform the schema update based on the local machine time. If
>> you do two updates in close succession on two different nodes, and their
>> clocks are not perfectly synchronized (and they will never be), Cassandra
>> might be confused by their relative ordering, and stop working altogether.
>>
>> Bottom line: don't.
>>
>> Flavio
>>
>> Il 12/21/2011 14:45 PM, Rafael Almeida ha scritto:
>>
>>> Hello,
>>>
>>> I am evaluating the usage of cassandra for my system. I will have several
>>> clients who won't share data with each other. My idea is to create one
>>> column family per client. When a new client comes in and adds data to the
>>> system, I'd like to create a column family dynamically. Is that reliable?
>>> Can I create a column family on a node and imediately add new data on that
>>> column family and be confident that the data added will eventually become
>>> visible to a read?
>>>
>>> []'s
>>> Rafael
>>>
>>>
>>>
>>
>

Re: Creating column families per client

Posted by Ryan Lowe <ry...@gmail.com>.
What we have done to avoid creating multiple column families is to sort of
namespace the row key.  So if we have a column family of Users and
accounts: "AccountA" and "AccountB", we do the following:

Column Family User:
   "AccountA/ryan" : { first: Ryan, last: Lowe }
   "AccountB/ryan" : { first: Ryan, last: Smith}

etc.

For our needs, this did the same thing as having 2 "User" column families
for "AccountA" and "AccountB"

Ryan

On Wed, Dec 21, 2011 at 10:34 AM, Flavio Baronti
<f....@list-group.com>wrote:

> Hi,
>
> based on my experience with Cassandra 0.7.4, i strongly discourage you to
> do that: we tried dynamical creation of column families, and it was a
> nightmare.
> First of all, the operation can not be done concurrently, therefore you
> must find a way to avoid parallel creation (over all the cluster, not in a
> single node).
> The main problem however is with timestamps. The structure of your
> keyspace is versioned with a time-dependent id, which is assigned by the
> host where you perform the schema update based on the local machine time.
> If you do two updates in close succession on two different nodes, and their
> clocks are not perfectly synchronized (and they will never be), Cassandra
> might be confused by their relative ordering, and stop working altogether.
>
> Bottom line: don't.
>
> Flavio
>
> Il 12/21/2011 14:45 PM, Rafael Almeida ha scritto:
>
>  Hello,
>>
>> I am evaluating the usage of cassandra for my system. I will have several
>> clients who won't share data with each other. My idea is to create one
>> column family per client. When a new client comes in and adds data to the
>> system, I'd like to create a column family dynamically. Is that reliable?
>> Can I create a column family on a node and imediately add new data on that
>> column family and be confident that the data added will eventually become
>> visible to a read?
>>
>> []'s
>> Rafael
>>
>>
>>
>>
>

Re: Creating column families per client

Posted by Flavio Baronti <f....@list-group.com>.
Hi,

based on my experience with Cassandra 0.7.4, i strongly discourage you to do that: we tried dynamical creation of column 
families, and it was a nightmare.
First of all, the operation can not be done concurrently, therefore you must find a way to avoid parallel creation (over 
all the cluster, not in a single node).
The main problem however is with timestamps. The structure of your keyspace is versioned with a time-dependent id, which 
is assigned by the host where you perform the schema update based on the local machine time. If you do two updates in 
close succession on two different nodes, and their clocks are not perfectly synchronized (and they will never be), 
Cassandra might be confused by their relative ordering, and stop working altogether.

Bottom line: don't.

Flavio

Il 12/21/2011 14:45 PM, Rafael Almeida ha scritto:
> Hello,
>
> I am evaluating the usage of cassandra for my system. I will have several clients who won't share data with each other. My idea is to create one column family per client. When a new client comes in and adds data to the system, I'd like to create a column family dynamically. Is that reliable? Can I create a column family on a node and imediately add new data on that column family and be confident that the data added will eventually become visible to a read?
>
> []'s
> Rafael
>
>
>