You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Toru Inoko <in...@ms.scsk.jp> on 2012/06/05 02:31:43 UTC

Re: about multitenant datamodel

> IMHO a model that allows external users to create CF's is a bad one.

why do you think so? I'll let users create ristricted CFs, and limit a  
number of CFs which users create.
is it still a bad one?

On Thu, 31 May 2012 06:44:05 +0900, aaron morton <aa...@thelastpickle.com>  
wrote:

>> - Do a lot of keyspaces cause some problems? (If I have 1,000 users,  
>> cassandra creates 1,000 keyspaces…)
> It's not keyspaces, but the number of column families.
>
> Without storing any data each CF uses about 1MB of ram. When they start  
> storing and reading data they use more.
>
> IMHO a model that allows external users to create CF's is a bad one.
>
> Hope that helps.
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 25/05/2012, at 12:52 PM, Toru Inoko wrote:
>
>> Hi, all.
>>
>> I'm designing data api service(like cassandra.io but not using  
>> dedicated server for each user) on cassandra 1.1 on which users can do  
>> DML/DDL method like cql.
>> Followings are api which users can use( almost same to cassandra api).
>> - create/read/delete ColumnFamilies/Rows/Columns
>>
>> Now I'm thinking about multitenant datamodel on that.
>> My data model like the following.
>> I'm going to prepare a keyspace for each user as a user's tenant space.
>>
>> | keyspace1 | --- | column family |
>> |(for user1)|  |
>>               ...
>>
>> | keyspace2 | --- | column family |
>> |(for user2)|  |
>>               ...
>>
>> Followings are my question!
>> - Is this data model a good for multitenant?
>> - Do a lot of keyspaces cause some problems? (If I have 1,000 users,  
>> cassandra creates 1,000 keyspaces...)
>>
>> please, help.
>> thank you in advance.
>>
>> Toru Inoko.
>>
>


-- 
-----------------------------------
SCSK株式会社
技術・品質・情報グループ 技術開発部
先端技術課

猪子 徹(Toru Inoko)
tel       : 03-6438-3544
mail      : inoko@ms.scsk.jp
-----------------------------------


Re: about multitenant datamodel

Posted by Toru Inoko <in...@ms.scsk.jp>.
> See virtual keyspaces in Hector.
Yes, at first, I tried to desigen data model like POD architecture  
(http://goo.gl/Uw1yD) with this.
But, it is problem for me that strong consistency isn't guaranteed among  
metadata schemas.

> Every CF has a certain amount of overhead in memory. It's just not how  
> Cassandra is designed to be used.
Thanks. I'll try to design meta schma data model again which has strong  
consistency.

Thank you for your advices!

On Wed, 06 Jun 2012 03:35:40 +0900, aaron morton <aa...@thelastpickle.com>  
wrote:

>> With an abstraction layer you can store practically anything in  
>> Cassandra.
> See virtual keyspaces in Hector.
>
>> why do you think so? I'll let users create ristricted CFs, and limit a  
>> number of CFs which users create.
>> is it still a bad one?
> Depends what your limits are, but in general still yes.
>
> If someone creates a CF with 10 secondary indexes they will use more  
> resources than someone who creates a CF with none. Same thing would  
> happen in a multitenant RDBMS server.
>
> If you have 200 CF's in a cluster it will use more memory than one with  
> 20 CF's. The extra memory use will result in more disk IO.
>
> Cheers
>
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 5/06/2012, at 7:52 PM, R. Verlangen wrote:
>
>> Every CF has a certain amount of overhead in memory. It's just not how  
>> Cassandra is designed to be used. Maybe you could think of a way to  
>> smash data down to indices and entities. With an abstraction layer you  
>> can store practically anything in Cassandra.
>>
>> 2012/6/5 Toru Inoko <in...@ms.scsk.jp>
>> IMHO a model that allows external users to create CF's is a bad one.
>>
>> why do you think so? I'll let users create ristricted CFs, and limit a  
>> number of CFs which users create.
>> is it still a bad one?
>>
>>
>> On Thu, 31 May 2012 06:44:05 +0900, aaron morton  
>> <aa...@thelastpickle.com> wrote:
>>
>> - Do a lot of keyspaces cause some problems? (If I have 1,000 users,  
>> cassandra creates 1,000 keyspaces…)
>> It's not keyspaces, but the number of column families.
>>
>> Without storing any data each CF uses about 1MB of ram. When they start  
>> storing and reading data they use more.
>>
>> IMHO a model that allows external users to create CF's is a bad one.
>>
>> Hope that helps.
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 25/05/2012, at 12:52 PM, Toru Inoko wrote:
>>
>> Hi, all.
>>
>> I'm designing data api service(like cassandra.io but not using  
>> dedicated server for each user) on cassandra 1.1 on which users can do  
>> DML/DDL method like cql.
>> Followings are api which users can use( almost same to cassandra api).
>> - create/read/delete ColumnFamilies/Rows/Columns
>>
>> Now I'm thinking about multitenant datamodel on that.
>> My data model like the following.
>> I'm going to prepare a keyspace for each user as a user's tenant space.
>>
>> | keyspace1 | --- | column family |
>> |(for user1)|  |
>>              ...
>>
>> | keyspace2 | --- | column family |
>> |(for user2)|  |
>>              ...
>>
>> Followings are my question!
>> - Is this data model a good for multitenant?
>> - Do a lot of keyspaces cause some problems? (If I have 1,000 users,  
>> cassandra creates 1,000 keyspaces...)
>>
>> please, help.
>> thank you in advance.
>>
>> Toru Inoko.
>>
>>
>>
>>

>>
>>
>>
>>
>> --
>> With kind regards,
>>
>> Robin Verlangen
>> Software engineer
>>
>> W http://www.robinverlangen.nl
>> E robin@us2.nl
>>
>> Disclaimer: The information contained in this message and attachments  
>> is intended solely for the attention and use of the named addressee and  
>> may be confidential. If you are not the intended recipient, you are  
>> reminded that the information remains the property of the sender. You  
>> must not use, disclose, distribute, copy, print or rely on this e-mail.  
>> If you have received this message in error, please contact the sender  
>> immediately and irrevocably delete this message and any copies.
>>
>


-- 
-----------------------------------
SCSK Corp.

Toru Inoko
tel       : 03-6438-3544
mail      : inoko@ms.scsk.jp
-----------------------------------


Re: about multitenant datamodel

Posted by aaron morton <aa...@thelastpickle.com>.
> With an abstraction layer you can store practically anything in Cassandra.
See virtual keyspaces in Hector. 

> why do you think so? I'll let users create ristricted CFs, and limit a number of CFs which users create.
> is it still a bad one?
Depends what your limits are, but in general still yes. 

If someone creates a CF with 10 secondary indexes they will use more resources than someone who creates a CF with none. Same thing would happen in a multitenant RDBMS server. 

If you have 200 CF's in a cluster it will use more memory than one with 20 CF's. The extra memory use will result in more disk IO.

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/06/2012, at 7:52 PM, R. Verlangen wrote:

> Every CF has a certain amount of overhead in memory. It's just not how Cassandra is designed to be used. Maybe you could think of a way to smash data down to indices and entities. With an abstraction layer you can store practically anything in Cassandra.
> 
> 2012/6/5 Toru Inoko <in...@ms.scsk.jp>
> IMHO a model that allows external users to create CF's is a bad one.
> 
> why do you think so? I'll let users create ristricted CFs, and limit a number of CFs which users create.
> is it still a bad one?
> 
> 
> On Thu, 31 May 2012 06:44:05 +0900, aaron morton <aa...@thelastpickle.com> wrote:
> 
> - Do a lot of keyspaces cause some problems? (If I have 1,000 users, cassandra creates 1,000 keyspaces…)
> It's not keyspaces, but the number of column families.
> 
> Without storing any data each CF uses about 1MB of ram. When they start storing and reading data they use more.
> 
> IMHO a model that allows external users to create CF's is a bad one.
> 
> Hope that helps.
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 25/05/2012, at 12:52 PM, Toru Inoko wrote:
> 
> Hi, all.
> 
> I'm designing data api service(like cassandra.io but not using dedicated server for each user) on cassandra 1.1 on which users can do DML/DDL method like cql.
> Followings are api which users can use( almost same to cassandra api).
> - create/read/delete ColumnFamilies/Rows/Columns
> 
> Now I'm thinking about multitenant datamodel on that.
> My data model like the following.
> I'm going to prepare a keyspace for each user as a user's tenant space.
> 
> | keyspace1 | --- | column family |
> |(for user1)|  |
>              ...
> 
> | keyspace2 | --- | column family |
> |(for user2)|  |
>              ...
> 
> Followings are my question!
> - Is this data model a good for multitenant?
> - Do a lot of keyspaces cause some problems? (If I have 1,000 users, cassandra creates 1,000 keyspaces...)
> 
> please, help.
> thank you in advance.
> 
> Toru Inoko.
> 
> 
> 
> 
> -- 
> -----------------------------------
> SCSK株式会社
> 技術・品質・情報グループ 技術開発部
> 先端技術課
> 
> 猪子 徹(Toru Inoko)
> tel       : 03-6438-3544
> mail      : inoko@ms.scsk.jp
> -----------------------------------
> 
> 
> 
> 
> -- 
> With kind regards,
> 
> Robin Verlangen
> Software engineer
> 
> W http://www.robinverlangen.nl
> E robin@us2.nl
> 
> Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
> 


Re: about multitenant datamodel

Posted by samal <sa...@gmail.com>.
why do you think so? I'll let users create ristricted CFs, and limit a
number of CFs which users create.

> is it still a bad one?
>>
> Ok, get it, you want to limit the cf user can create (assume) 2, what
about 10k shared users creating 2 cf each=> 20k CF ~~20GB memory used with
no data in it. Do you think it is good one?

 I can think of your data model like , S3 or shared hosting is limit
Keysapce and cf to fixed number.
In Cassandra key and column name is very powerful, you can do anything you
want, design DM anyway you want.

Here is the approach I probably will take.

   - Limit the user to key, user cannot create/delete cf,
   - All user will share same cf.
   - Give unique signature (which MUST NOT clash)  to each user like
"*username<==>anyothermarker::[[actual
   key name].....n]*" utf8 only
   - Each user will always prefix this signature in all cf when inserting
   and reading data.
   - Like S3 bucket check signature before creating new one for new user.
   - Each key for user will be like bucket, all columns can be bucket data.

Eg
1)
profileCF{

  *user1<==>123456::*profile{
                     /* user1 profile*/
  } ,
  *user2<==>4444444::*profile{
                     /* user2 profile*/
  } ,
}

2)
actvityCF{

  *user1<==>123456::*activity{
                     /* user1 activity columns here*/
  } ,
  *user2<==>**4444444**::*activity{
                     /* user2 activity columns here*/
  } ,
}

marker cf that will keep all unique  signature fro users. So it can be
queried while creating new one.

bucketMarkerCF{
     *user2<==>**4444444*:{
                username:"
     }
     *user1<==>2323*:{
                username:"
     }

}

problem with this approach is user may not have liberty to define their own
data model. Good for fixed pattern data: logger, hits, geodata.

/Samal


>>
>>
>
>> On Thu, 31 May 2012 06:44:05 +0900, aaron morton <aa...@thelastpickle.com>
>> wrote:
>>
>>  - Do a lot of keyspaces cause some problems? (If I have 1,000 users,
>>>> cassandra creates 1,000 keyspaces…)
>>>>
>>> It's not keyspaces, but the number of column families.
>>>
>>> Without storing any data each CF uses about 1MB of ram. When they start
>>> storing and reading data they use more.
>>>
>>> IMHO a model that allows external users to create CF's is a bad one.
>>>
>>> Hope that helps.
>>> -----------------
>>> Aaron Morton
>>> Freelance Developer
>>> @aaronmorton
>>> http://www.thelastpickle.com
>>>
>>> On 25/05/2012, at 12:52 PM, Toru Inoko wrote:
>>>
>>>  Hi, all.
>>>>
>>>> I'm designing data api service(like cassandra.io but not using
>>>> dedicated server for each user) on cassandra 1.1 on which users can do
>>>> DML/DDL method like cql.
>>>> Followings are api which users can use( almost same to cassandra api).
>>>> - create/read/delete ColumnFamilies/Rows/Columns
>>>>
>>>> Now I'm thinking about multitenant datamodel on that.
>>>> My data model like the following.
>>>> I'm going to prepare a keyspace for each user as a user's tenant space.
>>>>
>>>> | keyspace1 | --- | column family |
>>>> |(for user1)|  |
>>>>              ...
>>>>
>>>> | keyspace2 | --- | column family |
>>>> |(for user2)|  |
>>>>              ...
>>>>
>>>> Followings are my question!
>>>> - Is this data model a good for multitenant?
>>>> - Do a lot of keyspaces cause some problems? (If I have 1,000 users,
>>>> cassandra creates 1,000 keyspaces...)
>>>>
>>>> please, help.
>>>> thank you in advance.
>>>>
>>>> Toru Inoko.
>>>>
>>>>
>>>
>>
>> --
>> ------------------------------**-----
>> SCSK株式会社
>> 技術・品質・情報グループ 技術開発部
>> 先端技術課
>>
>> 猪子 徹(Toru Inoko)
>> tel       : 03-6438-3544
>> mail      : inoko@ms.scsk.jp
>> ------------------------------**-----
>>
>>
>
>
> --
> With kind regards,
>
> Robin Verlangen
> *Software engineer*
> *
> *
> W http://www.robinverlangen.nl
> E robin@us2.nl
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>

Re: about multitenant datamodel

Posted by "R. Verlangen" <ro...@us2.nl>.
Every CF has a certain amount of overhead in memory. It's just not how
Cassandra is designed to be used. Maybe you could think of a way to smash
data down to indices and entities. With an abstraction layer you can store
practically anything in Cassandra.

2012/6/5 Toru Inoko <in...@ms.scsk.jp>

> IMHO a model that allows external users to create CF's is a bad one.
>>
>
> why do you think so? I'll let users create ristricted CFs, and limit a
> number of CFs which users create.
> is it still a bad one?
>
>
> On Thu, 31 May 2012 06:44:05 +0900, aaron morton <aa...@thelastpickle.com>
> wrote:
>
>  - Do a lot of keyspaces cause some problems? (If I have 1,000 users,
>>> cassandra creates 1,000 keyspaces…)
>>>
>> It's not keyspaces, but the number of column families.
>>
>> Without storing any data each CF uses about 1MB of ram. When they start
>> storing and reading data they use more.
>>
>> IMHO a model that allows external users to create CF's is a bad one.
>>
>> Hope that helps.
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 25/05/2012, at 12:52 PM, Toru Inoko wrote:
>>
>>  Hi, all.
>>>
>>> I'm designing data api service(like cassandra.io but not using
>>> dedicated server for each user) on cassandra 1.1 on which users can do
>>> DML/DDL method like cql.
>>> Followings are api which users can use( almost same to cassandra api).
>>> - create/read/delete ColumnFamilies/Rows/Columns
>>>
>>> Now I'm thinking about multitenant datamodel on that.
>>> My data model like the following.
>>> I'm going to prepare a keyspace for each user as a user's tenant space.
>>>
>>> | keyspace1 | --- | column family |
>>> |(for user1)|  |
>>>              ...
>>>
>>> | keyspace2 | --- | column family |
>>> |(for user2)|  |
>>>              ...
>>>
>>> Followings are my question!
>>> - Is this data model a good for multitenant?
>>> - Do a lot of keyspaces cause some problems? (If I have 1,000 users,
>>> cassandra creates 1,000 keyspaces...)
>>>
>>> please, help.
>>> thank you in advance.
>>>
>>> Toru Inoko.
>>>
>>>
>>
>
> --
> ------------------------------**-----
> SCSK株式会社
> 技術・品質・情報グループ 技術開発部
> 先端技術課
>
> 猪子 徹(Toru Inoko)
> tel       : 03-6438-3544
> mail      : inoko@ms.scsk.jp
> ------------------------------**-----
>
>


-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.