You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by indika kumara <in...@gmail.com> on 2011/01/18 12:26:26 UTC

Re: Multi-tenancy, and authentication and authorization

Moving to user list

On Tue, Jan 18, 2011 at 4:05 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

> Have a read about JVM heap sizing here
> http://wiki.apache.org/cassandra/MemtableThresholds
>
> If you let people create keyspaces with a mouse click you will soon run out
> of memory.
>
> I use Cassandra to provide a self service "storage service" at my
> organisation. All virtual databases operate in the same Cassandra keyspace
> (which does not change), and I use namespaces in the keys to separate
> things. Take a look at how amazon S3 works, it may give you some ideas.
>
> If you want to continue to discussion let's move this to the user list.
>
> A
>
>
> On 17/01/2011, at 7:44 PM, indika kumara <in...@gmail.com> wrote:
>
> > Hi Stu,
> >
> > In our app,  we would like to offer cassandra 'as-is' to tenants. It that
> > case, each tenant should be able to create Keyspaces as needed. Based on
> the
> > authorization, I expect to implement it. In my view, the implementation
> > options are as follows.
> >
> > 1) The name of a keyspace would be 'the actual keyspace name' + 'tenant
> ID'
> >
> > 2) The name of a keyspace would not be changed, but the name of a column
> > family would be the 'the actual column family name' + 'tenant ID'.  It is
> > needed to keep a separate mapping for keyspace vs tenants.
> >
> > 3) The name of a keypace or a column family would not be changed, but the
> > name of a column would be 'the actual column name' + 'tenant ID'. It is
> > needed to keep separate mappings for keyspace vs tenants and column
> family
> > vs tenants
> >
> > Could you please give your opinions on the above three options?  if there
> > are any issue regarding above approaches and if those issues can be
> solved,
> > I would love to contribute on that.
> >
> > Thanks,
> >
> > Indika
> >
> >
> > On Fri, Jan 7, 2011 at 11:22 AM, Stu Hood <st...@gmail.com> wrote:
> >
> >>> (1) has the problem of multiple memtables (a large amount just isn't
> >> viable
> >> There are some very straightforward solutions to this particular
> problem: I
> >> wouldn't rule out running with a very large number of
> >> keyspace/columnfamilies given some minor changes.
> >>
> >> As Brandon said, some of the folks that were working on multi-tenancy
> for
> >> Cassandra are no longer focused on it. But the code that was generated
> >> during our efforts is very much available, and is unlikely to have gone
> >> stale. Would love to talk about this with you.
> >>
> >> Thanks,
> >> Stu
> >>
> >> On Thu, Jan 6, 2011 at 8:08 PM, indika kumara <in...@gmail.com>
> >> wrote:
> >>
> >>> Thank you very much Brandon!
> >>>
> >>> On Fri, Jan 7, 2011 at 12:40 AM, Brandon Williams <dr...@gmail.com>
> >>> wrote:
> >>>
> >>>> On Thu, Jan 6, 2011 at 12:33 PM, indika kumara <indika.kuma@gmail.com
> >>>>> wrote:
> >>>>
> >>>>> Hi Brandon,
> >>>>>
> >>>>> I would like you feedback on my two ideas for implementing mufti
> >>> tenancy
> >>>>> with the existing implementation.  Would those be possible to
> >>> implement?
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Indika
> >>>>>
> >>>>>>>>>> Two vague ideas: (1) qualified keyspaces (by the tenet domain)
> >>> (2)
> >>>>> multiple Cassandra storage configurations in a single node (one per
> >>>>> tenant).
> >>>>> For both options, the resource hierarchy would be /cassandra/
> >>>>> <cluster_name>/<tenant name (domain)>/keyspaces/<ks_name>/
> >>>>>
> >>>>
> >>>> (1) has the problem of multiple memtables (a large amount just isn't
> >>> viable
> >>>> right now.)  (2) more or less has the same problem, but in JVM
> >> instances.
> >>>>
> >>>> I would suggest a) not trying to offer cassandra itself, and instead
> >>> build
> >>>> a
> >>>> service that uses cassandra under the hood, and b) splitting up
> tenants
> >>> in
> >>>> this layer.
> >>>>
> >>>> -Brandon
> >>>>
> >>>
> >>
>

Re: Multi-tenancy, and authentication and authorization

Posted by David Boxenhorn <da...@lookin2.com>.

I'm not sure that "you'd still want to retain the ability to individually
control how flushing happens on a per-cf basis in order to cater to
different workloads that benefit from different flushing behavior". It seems
to me like a good system-wide algorithm that works dynamically, and takes
into account moment-by-moment usage, can do this better than a human who is
guessing and making decisions on a static basis.

Having said that, my suggestion doesn't really depend so much on having one
memtable or many. Rather, it depends on making flushing behavior dependent
on system-wide parameters, which reflect the actual physical resources
available per node, rather than per-CF parameters (though per-CF tuning can
be taken into account, it should be a suggestion that gets overridden by
system-wide needs).

On Wed, Jan 19, 2011 at 10:48 AM, Peter Schuller <
peter.schuller@infidyne.com> wrote:

> > Right now there is a one-to-one mapping between memtables and SSTables.
> > Instead of that, would it be possible to have one giant memtable for each
> > Cassandra instance, with partial flushing to SSTs?
>
> I think a complication here is that, although I agree things need to
> be easier to tweak at least for the common case, I'm pretty sure you'd
> still want to retain the ability to individually control how flushing
> happens on a per-cf basis in order to cater to different workloads
> that benefit from different flushing behavior.
>
> I suspect the main concern here may be that there is a desire to have
> better overal control over how flushing happens and when writes start
> blocking, rather than necessarily implying that there can't be more
> than one memtable (the ticket Stu posted seems to address one such
> means of control).
>
> --
> / Peter Schuller
>

Re: Multi-tenancy, and authentication and authorization

Posted by Peter Schuller <pe...@infidyne.com>.

> Right now there is a one-to-one mapping between memtables and SSTables.
> Instead of that, would it be possible to have one giant memtable for each
> Cassandra instance, with partial flushing to SSTs?

I think a complication here is that, although I agree things need to
be easier to tweak at least for the common case, I'm pretty sure you'd
still want to retain the ability to individually control how flushing
happens on a per-cf basis in order to cater to different workloads
that benefit from different flushing behavior.

I suspect the main concern here may be that there is a desire to have
better overal control over how flushing happens and when writes start
blocking, rather than necessarily implying that there can't be more
than one memtable (the ticket Stu posted seems to address one such
means of control).

-- 
/ Peter Schuller

Re: Multi-tenancy, and authentication and authorization

Posted by David Boxenhorn <da...@lookin2.com>.

+1



On Wed, Jan 19, 2011 at 10:35 AM, Stu Hood <st...@gmail.com> wrote:

> Opened https://issues.apache.org/jira/browse/CASSANDRA-2006 with the
> solution we had suggested on the MultiTenant wiki page.
>
>
> On Tue, Jan 18, 2011 at 11:56 PM, David Boxenhorn <da...@lookin2.com>wrote:
>
>> I think tuning of Cassandra is overly complex, and even with a single
>> tenant you can run into problems with too many CFs.
>>
>> Right now there is a one-to-one mapping between memtables and SSTables.
>> Instead of that, would it be possible to have one giant memtable for each
>> Cassandra instance, with partial flushing to SSTs?
>>
>> It seems to me like a single memtable would make it MUCH easier to tune
>> Cassandra, since the decision whether to (partially) flush the memtable to
>> disk could be made on a node-wide basis, based on the resources you really
>> have, instead of the guess-work that we are forced to do today.
>>
>
>

Re: Multi-tenancy, and authentication and authorization

Posted by Stu Hood <st...@gmail.com>.

Opened https://issues.apache.org/jira/browse/CASSANDRA-2006 with the
solution we had suggested on the MultiTenant wiki page.

On Tue, Jan 18, 2011 at 11:56 PM, David Boxenhorn <da...@lookin2.com> wrote:

> I think tuning of Cassandra is overly complex, and even with a single
> tenant you can run into problems with too many CFs.
>
> Right now there is a one-to-one mapping between memtables and SSTables.
> Instead of that, would it be possible to have one giant memtable for each
> Cassandra instance, with partial flushing to SSTs?
>
> It seems to me like a single memtable would make it MUCH easier to tune
> Cassandra, since the decision whether to (partially) flush the memtable to
> disk could be made on a node-wide basis, based on the resources you really
> have, instead of the guess-work that we are forced to do today.
>

Re: Multi-tenancy, and authentication and authorization

Posted by David Boxenhorn <da...@lookin2.com>.

I think tuning of Cassandra is overly complex, and even with a single tenant
you can run into problems with too many CFs.

Right now there is a one-to-one mapping between memtables and SSTables.
Instead of that, would it be possible to have one giant memtable for each
Cassandra instance, with partial flushing to SSTs?

It seems to me like a single memtable would make it MUCH easier to tune
Cassandra, since the decision whether to (partially) flush the memtable to
disk could be made on a node-wide basis, based on the resources you really
have, instead of the guess-work that we are forced to do today.

Re: Multi-tenancy, and authentication and authorization

Posted by Aaron Morton <aa...@thelastpickle.com>.

I've used an S3 style data model with a REST interface (varnish > nginx > tornado > cassandra), users do not see anything remotely cassandra like. 


Aaron


On 19 Jan, 2011,at 10:27 AM, Stephen Connolly <st...@gmail.com> wrote:

I would imagine it to be somewhat easy to implement this via a thrift wrapper so that each tenant is connecting to the proxy thrift server that masks the fact that there are multiple tenants... or is that how people are thinking about this
- Stephen
---
Sent from my Android phone, so random spelling mistakes, random nonsense words and other nonsense are a direct result of using swype to type on the screen
On 18 Jan 2011 21:20, "Aaron Morton" <aa...@thelastpickle.com> wrote:
> As everyone says, it's not issues with the Keyspace directly as they are just a container. It's the CF's in the keyspace, but let's just say keyspace cause it's easier.
> 
> As things stand, if you allow point and click creation for keyspaces you will hand over control of the memory requirements to the users. This will be a bad thing. E.g. Lots of cf's will get created and you will run out of memory, or cf's will get created with huge Memtable settings and you will run out of memory, or caches will get set huge and you get the picture. One badly behaving keyspace or column family can take down a node / cluster.
> 
> IMHO currently the best way to share a Cassandra cluster is through some sort of application layer that uses as static keyspace. Others have a better understanding of the internals and may have ideas about how this could change in the future.
> 
> Aaron
> 
> On 19/01/2011, at 9:07 AM, Ed Anuff <ed...@anuff.com> wrote:
> 
>> Hi Jeremy, thanks, I was really coming at it from the question of whether keyspaces were a functional basis for multitenancy in Cassandra. I think the MT issues discussed on the wiki page are the , but I'd like to get a better understanding of the core issue of keyspaces and then try to get that onto the page as maybe the first section.
>> 
>> Ed
>> 
>> On Tue, Jan 18, 2011 at 11:42 AM, Jeremy Hanna <je...@gmail.com> wrote:
>> Feel free to use that wiki page or another wiki page to collaborate on more pressing multi tenant issues. The wiki is editable by all. The MultiTenant page was meant as a launching point for tracking progress on things we could think of wrt MT.
>> 
>> Obviously the memtable problem is the largest concern at this point. If you have any ideas wrt that and want to collaborate on how to address that, perhaps even in a way that would get accepted in core cassandra, feel free to propose solutions in a jira ticket or on the list.
>> 
>> A caveat to getting things into core cassandra - make sure anything you do is considerate of single-tenant cassandra. If possible, make things pluggable and optional. The round robin request scheduler is an example. The functionality is there but you have to enable it. If it can't be made pluggable/optional, you can get good feedback from the community about proposed solutions in core Cassandra (like for the memtable issue in particular).
>> 
>> Anyway, just wanted to chime in with 2 cents about that page (since I created it and was helping maintain it before getting pulled off onto other projects).
>> 
>> On Jan 18, 2011, at 1:12 PM, Ed Anuff wrote:
>> 
>> > Hi Indika, I've done a lot of work using the keyspace per tenant model, and I'm seeing big problems with the memory consumption, even though it's certainly the most clean way to implement it. Luckily, before I used the keyspace per tenant approach, I'd implemented my system using a single keyspace approach and can still revert back to that. The rest of the stuff for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue is a big concern at the moment.
>> >
>> > Ed
>> >
>> > On Tue, Jan 18, 2011 at 9:40 AM, indika kumara <in...@gmail.com> wrote:
>> > Hi Aaron,
>> >
>> > I read some articles about the Cassandra, and now understand a little bit about trade-offs.
>> >
>> > I feel the goal should be to optimize memory as well as performance. I have to consider the number of column families, the columns per a family, the number of rows, the memtable’s threshold, and so on. I also have to consider how to maximize resource sharing among tenants. However, I feel that a keyspace should be able to be configured based on the tenant’s class (e.g replication factor). As per some resources, I feel that the issue is not in the number of keyspaces, but with the number of CF, the number of the rows in a CF, the numbers of columns, the size of the data in a column, and so on. Am I correct? I appreciate your opinion.
>> >
>> > What would be the suitable approach? A keyspace per tenant (there would be a limit on the tenants per a Cassandra cluster) or a keyspace for all tenant.
>> >
>> > I still would love to expose the Cassandra ‘as-is’ to a tenant virtually yet with acceptable memory consumption and performance.
>> >
>> > Thanks,
>> >
>> > Indika
>> >
>> >
>> 
>>

Re: Multi-tenancy, and authentication and authorization

Posted by Stephen Connolly <st...@gmail.com>.

I would imagine it to be somewhat easy to implement this via a thrift
wrapper so that each tenant is connecting to the proxy thrift server that
masks the fact that there are multiple tenants... or is that how people are
thinking about this

- Stephen

---
Sent from my Android phone, so random spelling mistakes, random nonsense
words and other nonsense are a direct result of using swype to type on the
screen
On 18 Jan 2011 21:20, "Aaron Morton" <aa...@thelastpickle.com> wrote:
> As everyone says, it's not issues with the Keyspace directly as they are
just a container. It's the CF's in the keyspace, but let's just say keyspace
cause it's easier.
>
> As things stand, if you allow point and click creation for keyspaces you
will hand over control of the memory requirements to the users. This will be
a bad thing. E.g. Lots of cf's will get created and you will run out of
memory, or cf's will get created with huge Memtable settings and you will
run out of memory, or caches will get set huge and you get the picture. One
badly behaving keyspace or column family can take down a node / cluster.
>
> IMHO currently the best way to share a Cassandra cluster is through some
sort of application layer that uses as static keyspace. Others have a better
understanding of the internals and may have ideas about how this could
change in the future.
>
> Aaron
>
> On 19/01/2011, at 9:07 AM, Ed Anuff <ed...@anuff.com> wrote:
>
>> Hi Jeremy, thanks, I was really coming at it from the question of whether
keyspaces were a functional basis for multitenancy in Cassandra. I think the
MT issues discussed on the wiki page are the , but I'd like to get a better
understanding of the core issue of keyspaces and then try to get that onto
the page as maybe the first section.
>>
>> Ed
>>
>> On Tue, Jan 18, 2011 at 11:42 AM, Jeremy Hanna <
jeremy.hanna1234@gmail.com> wrote:
>> Feel free to use that wiki page or another wiki page to collaborate on
more pressing multi tenant issues. The wiki is editable by all. The
MultiTenant page was meant as a launching point for tracking progress on
things we could think of wrt MT.
>>
>> Obviously the memtable problem is the largest concern at this point. If
you have any ideas wrt that and want to collaborate on how to address that,
perhaps even in a way that would get accepted in core cassandra, feel free
to propose solutions in a jira ticket or on the list.
>>
>> A caveat to getting things into core cassandra - make sure anything you
do is considerate of single-tenant cassandra. If possible, make things
pluggable and optional. The round robin request scheduler is an example. The
functionality is there but you have to enable it. If it can't be made
pluggable/optional, you can get good feedback from the community about
proposed solutions in core Cassandra (like for the memtable issue in
particular).
>>
>> Anyway, just wanted to chime in with 2 cents about that page (since I
created it and was helping maintain it before getting pulled off onto other
projects).
>>
>> On Jan 18, 2011, at 1:12 PM, Ed Anuff wrote:
>>
>> > Hi Indika, I've done a lot of work using the keyspace per tenant model,
and I'm seeing big problems with the memory consumption, even though it's
certainly the most clean way to implement it. Luckily, before I used the
keyspace per tenant approach, I'd implemented my system using a single
keyspace approach and can still revert back to that. The rest of the stuff
for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue
is a big concern at the moment.
>> >
>> > Ed
>> >
>> > On Tue, Jan 18, 2011 at 9:40 AM, indika kumara <in...@gmail.com>
wrote:
>> > Hi Aaron,
>> >
>> > I read some articles about the Cassandra, and now understand a little
bit about trade-offs.
>> >
>> > I feel the goal should be to optimize memory as well as performance. I
have to consider the number of column families, the columns per a family,
the number of rows, the memtable’s threshold, and so on. I also have to
consider how to maximize resource sharing among tenants. However, I feel
that a keyspace should be able to be configured based on the tenant’s class
(e.g replication factor). As per some resources, I feel that the issue is
not in the number of keyspaces, but with the number of CF, the number of the
rows in a CF, the numbers of columns, the size of the data in a column, and
so on. Am I correct? I appreciate your opinion.
>> >
>> > What would be the suitable approach? A keyspace per tenant (there would
be a limit on the tenants per a Cassandra cluster) or a keyspace for all
tenant.
>> >
>> > I still would love to expose the Cassandra ‘as-is’ to a tenant
virtually yet with acceptable memory consumption and performance.
>> >
>> > Thanks,
>> >
>> > Indika
>> >
>> >
>>
>>

Re: Multi-tenancy, and authentication and authorization

Posted by indika kumara <in...@gmail.com>.

I do not have a better knowledge about the Cassandra. As per my knowledge,
there is no such a tool. I believe, such a tool would be worth.

Thanks,

Indika

On Thu, Jan 20, 2011 at 6:15 PM, Mimi Aluminium <mi...@gmail.com>wrote:

> Hi,
>
> I have a question that somewhat related to the above.
> Is there a tool that predicts the resource consumption (i.e, memory, disk,
> CPU)  in an offline mode? Means it is given with the storage conf
> parameters, ks, CFs and data model, and then application parameters such
> read/write average rates. It should output the required sizes for memory,
> disk etc.
>
> I need to estimate costs for various configurations we might have and
> thus  I am working on building "simple" excel  for my own data model  - but
> then it came to my mind to ask wether something like that already exists.
>
> BTW, I think such tool can also help for the issues that were discussed
> before even though it will be built on averages which probably are no so
> fine-grained but it can provide worse cases numbers to the application
> that uses Cassandra
>
> Thanks,
> Miriam
>
>
> ==========
> Miriam Allalouf
> n
>
> On Thu, Jan 20, 2011 at 1:53 PM, indika kumara <in...@gmail.com>wrote:
>
>> Thanks David.... We decided to do it at our client-side as the initial
>> implementation. I will investigate the approaches for supporting the fine
>> grained control of the resources consumed by a sever, tenant, and CF.
>>
>> Thanks,
>>
>> Indika
>>
>> On Thu, Jan 20, 2011 at 3:20 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>
>>> As far as I can tell, if Cassandra supports three levels of configuration
>>> (server, keyspace, column family) we can support multi-tenancy. It is
>>> trivial to give each tenant their own keyspace (e.g. just use the tenant's
>>> id as the keyspace name) and let them go wild. (Any out-of-bounds behavior
>>> on the CF level will be stopped at the keyspace and server level before
>>> doing any damage.)
>>>
>>> I don't think Cassandra needs to know about end-users. From Cassandra's
>>> point of view the tenant is the user.
>>>
>>> On Thu, Jan 20, 2011 at 7:00 AM, indika kumara <in...@gmail.com>wrote:
>>>
>>>> +1   Are there JIRAs for these requirements? I would like to contribute
>>>> from my capacity.
>>>>
>>>> As per my understanding, to support some muti-tenant models, it is
>>>> needed to qualified keyspaces' names, Cfs' names, etc. with the tenant
>>>> namespace (or id). The easiest way to do this would be to modify
>>>> corresponding constructs transparently. I tought of a stage (optional and
>>>> configurable) prior to authorization. Is there any better solutions? I
>>>> appreciate the community's suggestions.
>>>>
>>>> Moreover, It is needed to send the tenant NS(id) with the user
>>>> credentials (A users belongs to this tenant (org.)). For that purpose, I
>>>> thought of using the user credentials in the AuthenticationRequest. s there
>>>> any better solution?
>>>>
>>>> I would like to have a MT support at the Cassandra level which is
>>>> optional and configurable.
>>>>
>>>> Thanks,
>>>>
>>>> Indika
>>>>
>>>>
>>>> On Wed, Jan 19, 2011 at 7:40 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>>>
>>>>> Yes, the way I see it - and it becomes even more necessary for a
>>>>> multi-tenant configuration - there should be completely separate
>>>>> configurations for applications and for servers.
>>>>>
>>>>> - Application configuration is based on data and usage characteristics
>>>>> of your application.
>>>>> - Server configuration is based on the specific hardware limitations of
>>>>> the server.
>>>>>
>>>>> Obviously, server limitations take priority over application
>>>>> configuration.
>>>>>
>>>>> Assuming that each tenant in a multi-tenant environment gets one
>>>>> keyspace, you would also want to enforce limitations based on keyspace
>>>>> (which correspond to parameters that the tenant payed for).
>>>>>
>>>>> So now we have three levels:
>>>>>
>>>>> 1. Server configuration (top priority)
>>>>> 2. Keyspace configuration (payed-for service - second priority)
>>>>> 3. Column family configuration (configuration provided by tenant -
>>>>> third priority)
>>>>>
>>>>>
>>>>> On Wed, Jan 19, 2011 at 3:15 PM, indika kumara <in...@gmail.com>wrote:
>>>>>
>>>>>> As the actual problem is mostly related to the number of CFs in the
>>>>>> system (may be number of the columns), I still believe that supporting
>>>>>> exposing the Cassandra ‘as-is’ to a tenant is doable and suitable though
>>>>>> need some fixes.  That multi-tenancy model allows a tenant to use the
>>>>>> programming model of the Cassandra ‘as-is’, enabling the seamless migration
>>>>>> of an application that uses the Cassandra into the cloud. Moreover, In order
>>>>>> to support different SLA requirements of different tenants, the
>>>>>> configurability of keyspaces, cfs, etc., per tenant may be critical.
>>>>>> However, there are trade-offs among usability, memory consumption, and
>>>>>> performance. I believe that it is important to consider the SLA requirements
>>>>>> of different tenants when deciding the strategies for controlling resource
>>>>>> consumption.
>>>>>>
>>>>>> I like to the idea of system-wide parameters for controlling resource
>>>>>> usage. I believe that the tenant-specific parameters are equally important.
>>>>>> There are resources, and each tenant can claim a portion of them based on
>>>>>> SLA. For instance, if there is a threshold on the number of columns per a
>>>>>> node, it should be able to decide how many columns a particular tenant can
>>>>>> have.  It allows selecting a suitable Cassandra cluster for a tenant based
>>>>>> on his or her SLA. I believe the capability to configure resource
>>>>>> controlling parameters per keyspace would be important to support a keyspace
>>>>>> per tenant model. Furthermore, In order to maximize the resource sharing
>>>>>> among tenants, a threshold (on a resource) per keyspace should not be a hard
>>>>>> limit. Rather, it should be oscillated between a hard minimum and a maximum.
>>>>>> For example, if a particular tenant needs more resources at a given time, he
>>>>>> or she should be possible to borrow from the others up to the maximum. The
>>>>>> threshold is only considered when a tenant is assigned to a cluster - the
>>>>>> remaining resources of a cluster should be equal or higher than the resource
>>>>>> limit of the tenant. It may need to spread a single keyspace across multiple
>>>>>> clusters; especially when there are no enough resources in a single
>>>>>> cluster.
>>>>>>
>>>>>> I believe that it would be better to have a flexibility to change
>>>>>> seamlessly multi-tenancy implementation models such as the Cassadra ‘as-is’,
>>>>>> the keyspace per tenant model, a keyspace for all tenants, and so on.  Based
>>>>>> on what I have learnt, each model requires adding tenant id (name space) to
>>>>>> a keyspace’s name or cf’s name or raw key, or column’s name.  Would it be
>>>>>> better to have a kind of pluggable handler that can access those resources
>>>>>> prior to doing the actual operation so that the required changes can be
>>>>>> done? May be prior to authorization.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Indika
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Multi-tenancy, and authentication and authorization

Posted by Mimi Aluminium <mi...@gmail.com>.

Hi,

I have a question that somewhat related to the above.
Is there a tool that predicts the resource consumption (i.e, memory, disk,
CPU)  in an offline mode? Means it is given with the storage conf
parameters, ks, CFs and data model, and then application parameters such
read/write average rates. It should output the required sizes for memory,
disk etc.

I need to estimate costs for various configurations we might have and
thus  I am working on building "simple" excel  for my own data model  - but
then it came to my mind to ask wether something like that already exists.

BTW, I think such tool can also help for the issues that were discussed
before even though it will be built on averages which probably are no so
fine-grained but it can provide worse cases numbers to the application
that uses Cassandra

Thanks,
Miriam


==========
Miriam Allalouf
n

On Thu, Jan 20, 2011 at 1:53 PM, indika kumara <in...@gmail.com>wrote:

> Thanks David.... We decided to do it at our client-side as the initial
> implementation. I will investigate the approaches for supporting the fine
> grained control of the resources consumed by a sever, tenant, and CF.
>
> Thanks,
>
> Indika
>
> On Thu, Jan 20, 2011 at 3:20 PM, David Boxenhorn <da...@lookin2.com>wrote:
>
>> As far as I can tell, if Cassandra supports three levels of configuration
>> (server, keyspace, column family) we can support multi-tenancy. It is
>> trivial to give each tenant their own keyspace (e.g. just use the tenant's
>> id as the keyspace name) and let them go wild. (Any out-of-bounds behavior
>> on the CF level will be stopped at the keyspace and server level before
>> doing any damage.)
>>
>> I don't think Cassandra needs to know about end-users. From Cassandra's
>> point of view the tenant is the user.
>>
>> On Thu, Jan 20, 2011 at 7:00 AM, indika kumara <in...@gmail.com>wrote:
>>
>>> +1   Are there JIRAs for these requirements? I would like to contribute
>>> from my capacity.
>>>
>>> As per my understanding, to support some muti-tenant models, it is needed
>>> to qualified keyspaces' names, Cfs' names, etc. with the tenant namespace
>>> (or id). The easiest way to do this would be to modify corresponding
>>> constructs transparently. I tought of a stage (optional and configurable)
>>> prior to authorization. Is there any better solutions? I appreciate the
>>> community's suggestions.
>>>
>>> Moreover, It is needed to send the tenant NS(id) with the user
>>> credentials (A users belongs to this tenant (org.)). For that purpose, I
>>> thought of using the user credentials in the AuthenticationRequest. s there
>>> any better solution?
>>>
>>> I would like to have a MT support at the Cassandra level which is
>>> optional and configurable.
>>>
>>> Thanks,
>>>
>>> Indika
>>>
>>>
>>> On Wed, Jan 19, 2011 at 7:40 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>>
>>>> Yes, the way I see it - and it becomes even more necessary for a
>>>> multi-tenant configuration - there should be completely separate
>>>> configurations for applications and for servers.
>>>>
>>>> - Application configuration is based on data and usage characteristics
>>>> of your application.
>>>> - Server configuration is based on the specific hardware limitations of
>>>> the server.
>>>>
>>>> Obviously, server limitations take priority over application
>>>> configuration.
>>>>
>>>> Assuming that each tenant in a multi-tenant environment gets one
>>>> keyspace, you would also want to enforce limitations based on keyspace
>>>> (which correspond to parameters that the tenant payed for).
>>>>
>>>> So now we have three levels:
>>>>
>>>> 1. Server configuration (top priority)
>>>> 2. Keyspace configuration (payed-for service - second priority)
>>>> 3. Column family configuration (configuration provided by tenant - third
>>>> priority)
>>>>
>>>>
>>>> On Wed, Jan 19, 2011 at 3:15 PM, indika kumara <in...@gmail.com>wrote:
>>>>
>>>>> As the actual problem is mostly related to the number of CFs in the
>>>>> system (may be number of the columns), I still believe that supporting
>>>>> exposing the Cassandra ‘as-is’ to a tenant is doable and suitable though
>>>>> need some fixes.  That multi-tenancy model allows a tenant to use the
>>>>> programming model of the Cassandra ‘as-is’, enabling the seamless migration
>>>>> of an application that uses the Cassandra into the cloud. Moreover, In order
>>>>> to support different SLA requirements of different tenants, the
>>>>> configurability of keyspaces, cfs, etc., per tenant may be critical.
>>>>> However, there are trade-offs among usability, memory consumption, and
>>>>> performance. I believe that it is important to consider the SLA requirements
>>>>> of different tenants when deciding the strategies for controlling resource
>>>>> consumption.
>>>>>
>>>>> I like to the idea of system-wide parameters for controlling resource
>>>>> usage. I believe that the tenant-specific parameters are equally important.
>>>>> There are resources, and each tenant can claim a portion of them based on
>>>>> SLA. For instance, if there is a threshold on the number of columns per a
>>>>> node, it should be able to decide how many columns a particular tenant can
>>>>> have.  It allows selecting a suitable Cassandra cluster for a tenant based
>>>>> on his or her SLA. I believe the capability to configure resource
>>>>> controlling parameters per keyspace would be important to support a keyspace
>>>>> per tenant model. Furthermore, In order to maximize the resource sharing
>>>>> among tenants, a threshold (on a resource) per keyspace should not be a hard
>>>>> limit. Rather, it should be oscillated between a hard minimum and a maximum.
>>>>> For example, if a particular tenant needs more resources at a given time, he
>>>>> or she should be possible to borrow from the others up to the maximum. The
>>>>> threshold is only considered when a tenant is assigned to a cluster - the
>>>>> remaining resources of a cluster should be equal or higher than the resource
>>>>> limit of the tenant. It may need to spread a single keyspace across multiple
>>>>> clusters; especially when there are no enough resources in a single
>>>>> cluster.
>>>>>
>>>>> I believe that it would be better to have a flexibility to change
>>>>> seamlessly multi-tenancy implementation models such as the Cassadra ‘as-is’,
>>>>> the keyspace per tenant model, a keyspace for all tenants, and so on.  Based
>>>>> on what I have learnt, each model requires adding tenant id (name space) to
>>>>> a keyspace’s name or cf’s name or raw key, or column’s name.  Would it be
>>>>> better to have a kind of pluggable handler that can access those resources
>>>>> prior to doing the actual operation so that the required changes can be
>>>>> done? May be prior to authorization.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Indika
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Multi-tenancy, and authentication and authorization

Posted by David Boxenhorn <da...@lookin2.com>.

I have added my comments to this issue:

https://issues.apache.org/jira/browse/CASSANDRA-2006

Good luck!

On Thu, Jan 20, 2011 at 1:53 PM, indika kumara <in...@gmail.com>wrote:

> Thanks David.... We decided to do it at our client-side as the initial
> implementation. I will investigate the approaches for supporting the fine
> grained control of the resources consumed by a sever, tenant, and CF.
>
> Thanks,
>
> Indika
>
> On Thu, Jan 20, 2011 at 3:20 PM, David Boxenhorn <da...@lookin2.com>wrote:
>
>> As far as I can tell, if Cassandra supports three levels of configuration
>> (server, keyspace, column family) we can support multi-tenancy. It is
>> trivial to give each tenant their own keyspace (e.g. just use the tenant's
>> id as the keyspace name) and let them go wild. (Any out-of-bounds behavior
>> on the CF level will be stopped at the keyspace and server level before
>> doing any damage.)
>>
>> I don't think Cassandra needs to know about end-users. From Cassandra's
>> point of view the tenant is the user.
>>
>> On Thu, Jan 20, 2011 at 7:00 AM, indika kumara <in...@gmail.com>wrote:
>>
>>> +1   Are there JIRAs for these requirements? I would like to contribute
>>> from my capacity.
>>>
>>> As per my understanding, to support some muti-tenant models, it is needed
>>> to qualified keyspaces' names, Cfs' names, etc. with the tenant namespace
>>> (or id). The easiest way to do this would be to modify corresponding
>>> constructs transparently. I tought of a stage (optional and configurable)
>>> prior to authorization. Is there any better solutions? I appreciate the
>>> community's suggestions.
>>>
>>> Moreover, It is needed to send the tenant NS(id) with the user
>>> credentials (A users belongs to this tenant (org.)). For that purpose, I
>>> thought of using the user credentials in the AuthenticationRequest. s there
>>> any better solution?
>>>
>>> I would like to have a MT support at the Cassandra level which is
>>> optional and configurable.
>>>
>>> Thanks,
>>>
>>> Indika
>>>
>>>
>>> On Wed, Jan 19, 2011 at 7:40 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>>
>>>> Yes, the way I see it - and it becomes even more necessary for a
>>>> multi-tenant configuration - there should be completely separate
>>>> configurations for applications and for servers.
>>>>
>>>> - Application configuration is based on data and usage characteristics
>>>> of your application.
>>>> - Server configuration is based on the specific hardware limitations of
>>>> the server.
>>>>
>>>> Obviously, server limitations take priority over application
>>>> configuration.
>>>>
>>>> Assuming that each tenant in a multi-tenant environment gets one
>>>> keyspace, you would also want to enforce limitations based on keyspace
>>>> (which correspond to parameters that the tenant payed for).
>>>>
>>>> So now we have three levels:
>>>>
>>>> 1. Server configuration (top priority)
>>>> 2. Keyspace configuration (payed-for service - second priority)
>>>> 3. Column family configuration (configuration provided by tenant - third
>>>> priority)
>>>>
>>>>
>>>> On Wed, Jan 19, 2011 at 3:15 PM, indika kumara <in...@gmail.com>wrote:
>>>>
>>>>> As the actual problem is mostly related to the number of CFs in the
>>>>> system (may be number of the columns), I still believe that supporting
>>>>> exposing the Cassandra ‘as-is’ to a tenant is doable and suitable though
>>>>> need some fixes.  That multi-tenancy model allows a tenant to use the
>>>>> programming model of the Cassandra ‘as-is’, enabling the seamless migration
>>>>> of an application that uses the Cassandra into the cloud. Moreover, In order
>>>>> to support different SLA requirements of different tenants, the
>>>>> configurability of keyspaces, cfs, etc., per tenant may be critical.
>>>>> However, there are trade-offs among usability, memory consumption, and
>>>>> performance. I believe that it is important to consider the SLA requirements
>>>>> of different tenants when deciding the strategies for controlling resource
>>>>> consumption.
>>>>>
>>>>> I like to the idea of system-wide parameters for controlling resource
>>>>> usage. I believe that the tenant-specific parameters are equally important.
>>>>> There are resources, and each tenant can claim a portion of them based on
>>>>> SLA. For instance, if there is a threshold on the number of columns per a
>>>>> node, it should be able to decide how many columns a particular tenant can
>>>>> have.  It allows selecting a suitable Cassandra cluster for a tenant based
>>>>> on his or her SLA. I believe the capability to configure resource
>>>>> controlling parameters per keyspace would be important to support a keyspace
>>>>> per tenant model. Furthermore, In order to maximize the resource sharing
>>>>> among tenants, a threshold (on a resource) per keyspace should not be a hard
>>>>> limit. Rather, it should be oscillated between a hard minimum and a maximum.
>>>>> For example, if a particular tenant needs more resources at a given time, he
>>>>> or she should be possible to borrow from the others up to the maximum. The
>>>>> threshold is only considered when a tenant is assigned to a cluster - the
>>>>> remaining resources of a cluster should be equal or higher than the resource
>>>>> limit of the tenant. It may need to spread a single keyspace across multiple
>>>>> clusters; especially when there are no enough resources in a single
>>>>> cluster.
>>>>>
>>>>> I believe that it would be better to have a flexibility to change
>>>>> seamlessly multi-tenancy implementation models such as the Cassadra ‘as-is’,
>>>>> the keyspace per tenant model, a keyspace for all tenants, and so on.  Based
>>>>> on what I have learnt, each model requires adding tenant id (name space) to
>>>>> a keyspace’s name or cf’s name or raw key, or column’s name.  Would it be
>>>>> better to have a kind of pluggable handler that can access those resources
>>>>> prior to doing the actual operation so that the required changes can be
>>>>> done? May be prior to authorization.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Indika
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Multi-tenancy, and authentication and authorization

Posted by indika kumara <in...@gmail.com>.

Thanks David.... We decided to do it at our client-side as the initial
implementation. I will investigate the approaches for supporting the fine
grained control of the resources consumed by a sever, tenant, and CF.

Thanks,

Indika

On Thu, Jan 20, 2011 at 3:20 PM, David Boxenhorn <da...@lookin2.com> wrote:

> As far as I can tell, if Cassandra supports three levels of configuration
> (server, keyspace, column family) we can support multi-tenancy. It is
> trivial to give each tenant their own keyspace (e.g. just use the tenant's
> id as the keyspace name) and let them go wild. (Any out-of-bounds behavior
> on the CF level will be stopped at the keyspace and server level before
> doing any damage.)
>
> I don't think Cassandra needs to know about end-users. From Cassandra's
> point of view the tenant is the user.
>
> On Thu, Jan 20, 2011 at 7:00 AM, indika kumara <in...@gmail.com>wrote:
>
>> +1   Are there JIRAs for these requirements? I would like to contribute
>> from my capacity.
>>
>> As per my understanding, to support some muti-tenant models, it is needed
>> to qualified keyspaces' names, Cfs' names, etc. with the tenant namespace
>> (or id). The easiest way to do this would be to modify corresponding
>> constructs transparently. I tought of a stage (optional and configurable)
>> prior to authorization. Is there any better solutions? I appreciate the
>> community's suggestions.
>>
>> Moreover, It is needed to send the tenant NS(id) with the user credentials
>> (A users belongs to this tenant (org.)). For that purpose, I thought of
>> using the user credentials in the AuthenticationRequest. s there any better
>> solution?
>>
>> I would like to have a MT support at the Cassandra level which is optional
>> and configurable.
>>
>> Thanks,
>>
>> Indika
>>
>>
>> On Wed, Jan 19, 2011 at 7:40 PM, David Boxenhorn <da...@lookin2.com>wrote:
>>
>>> Yes, the way I see it - and it becomes even more necessary for a
>>> multi-tenant configuration - there should be completely separate
>>> configurations for applications and for servers.
>>>
>>> - Application configuration is based on data and usage characteristics of
>>> your application.
>>> - Server configuration is based on the specific hardware limitations of
>>> the server.
>>>
>>> Obviously, server limitations take priority over application
>>> configuration.
>>>
>>> Assuming that each tenant in a multi-tenant environment gets one
>>> keyspace, you would also want to enforce limitations based on keyspace
>>> (which correspond to parameters that the tenant payed for).
>>>
>>> So now we have three levels:
>>>
>>> 1. Server configuration (top priority)
>>> 2. Keyspace configuration (payed-for service - second priority)
>>> 3. Column family configuration (configuration provided by tenant - third
>>> priority)
>>>
>>>
>>> On Wed, Jan 19, 2011 at 3:15 PM, indika kumara <in...@gmail.com>wrote:
>>>
>>>> As the actual problem is mostly related to the number of CFs in the
>>>> system (may be number of the columns), I still believe that supporting
>>>> exposing the Cassandra ‘as-is’ to a tenant is doable and suitable though
>>>> need some fixes.  That multi-tenancy model allows a tenant to use the
>>>> programming model of the Cassandra ‘as-is’, enabling the seamless migration
>>>> of an application that uses the Cassandra into the cloud. Moreover, In order
>>>> to support different SLA requirements of different tenants, the
>>>> configurability of keyspaces, cfs, etc., per tenant may be critical.
>>>> However, there are trade-offs among usability, memory consumption, and
>>>> performance. I believe that it is important to consider the SLA requirements
>>>> of different tenants when deciding the strategies for controlling resource
>>>> consumption.
>>>>
>>>> I like to the idea of system-wide parameters for controlling resource
>>>> usage. I believe that the tenant-specific parameters are equally important.
>>>> There are resources, and each tenant can claim a portion of them based on
>>>> SLA. For instance, if there is a threshold on the number of columns per a
>>>> node, it should be able to decide how many columns a particular tenant can
>>>> have.  It allows selecting a suitable Cassandra cluster for a tenant based
>>>> on his or her SLA. I believe the capability to configure resource
>>>> controlling parameters per keyspace would be important to support a keyspace
>>>> per tenant model. Furthermore, In order to maximize the resource sharing
>>>> among tenants, a threshold (on a resource) per keyspace should not be a hard
>>>> limit. Rather, it should be oscillated between a hard minimum and a maximum.
>>>> For example, if a particular tenant needs more resources at a given time, he
>>>> or she should be possible to borrow from the others up to the maximum. The
>>>> threshold is only considered when a tenant is assigned to a cluster - the
>>>> remaining resources of a cluster should be equal or higher than the resource
>>>> limit of the tenant. It may need to spread a single keyspace across multiple
>>>> clusters; especially when there are no enough resources in a single
>>>> cluster.
>>>>
>>>> I believe that it would be better to have a flexibility to change
>>>> seamlessly multi-tenancy implementation models such as the Cassadra ‘as-is’,
>>>> the keyspace per tenant model, a keyspace for all tenants, and so on.  Based
>>>> on what I have learnt, each model requires adding tenant id (name space) to
>>>> a keyspace’s name or cf’s name or raw key, or column’s name.  Would it be
>>>> better to have a kind of pluggable handler that can access those resources
>>>> prior to doing the actual operation so that the required changes can be
>>>> done? May be prior to authorization.
>>>>
>>>> Thanks,
>>>>
>>>> Indika
>>>>
>>>
>>>
>>
>

Re: Multi-tenancy, and authentication and authorization

Posted by David Boxenhorn <da...@lookin2.com>.

As far as I can tell, if Cassandra supports three levels of configuration
(server, keyspace, column family) we can support multi-tenancy. It is
trivial to give each tenant their own keyspace (e.g. just use the tenant's
id as the keyspace name) and let them go wild. (Any out-of-bounds behavior
on the CF level will be stopped at the keyspace and server level before
doing any damage.)

I don't think Cassandra needs to know about end-users. From Cassandra's
point of view the tenant is the user.

On Thu, Jan 20, 2011 at 7:00 AM, indika kumara <in...@gmail.com>wrote:

> +1   Are there JIRAs for these requirements? I would like to contribute
> from my capacity.
>
> As per my understanding, to support some muti-tenant models, it is needed
> to qualified keyspaces' names, Cfs' names, etc. with the tenant namespace
> (or id). The easiest way to do this would be to modify corresponding
> constructs transparently. I tought of a stage (optional and configurable)
> prior to authorization. Is there any better solutions? I appreciate the
> community's suggestions.
>
> Moreover, It is needed to send the tenant NS(id) with the user credentials
> (A users belongs to this tenant (org.)). For that purpose, I thought of
> using the user credentials in the AuthenticationRequest. s there any better
> solution?
>
> I would like to have a MT support at the Cassandra level which is optional
> and configurable.
>
> Thanks,
>
> Indika
>
>
> On Wed, Jan 19, 2011 at 7:40 PM, David Boxenhorn <da...@lookin2.com>wrote:
>
>> Yes, the way I see it - and it becomes even more necessary for a
>> multi-tenant configuration - there should be completely separate
>> configurations for applications and for servers.
>>
>> - Application configuration is based on data and usage characteristics of
>> your application.
>> - Server configuration is based on the specific hardware limitations of
>> the server.
>>
>> Obviously, server limitations take priority over application
>> configuration.
>>
>> Assuming that each tenant in a multi-tenant environment gets one keyspace,
>> you would also want to enforce limitations based on keyspace (which
>> correspond to parameters that the tenant payed for).
>>
>> So now we have three levels:
>>
>> 1. Server configuration (top priority)
>> 2. Keyspace configuration (payed-for service - second priority)
>> 3. Column family configuration (configuration provided by tenant - third
>> priority)
>>
>>
>> On Wed, Jan 19, 2011 at 3:15 PM, indika kumara <in...@gmail.com>wrote:
>>
>>> As the actual problem is mostly related to the number of CFs in the
>>> system (may be number of the columns), I still believe that supporting
>>> exposing the Cassandra ‘as-is’ to a tenant is doable and suitable though
>>> need some fixes.  That multi-tenancy model allows a tenant to use the
>>> programming model of the Cassandra ‘as-is’, enabling the seamless migration
>>> of an application that uses the Cassandra into the cloud. Moreover, In order
>>> to support different SLA requirements of different tenants, the
>>> configurability of keyspaces, cfs, etc., per tenant may be critical.
>>> However, there are trade-offs among usability, memory consumption, and
>>> performance. I believe that it is important to consider the SLA requirements
>>> of different tenants when deciding the strategies for controlling resource
>>> consumption.
>>>
>>> I like to the idea of system-wide parameters for controlling resource
>>> usage. I believe that the tenant-specific parameters are equally important.
>>> There are resources, and each tenant can claim a portion of them based on
>>> SLA. For instance, if there is a threshold on the number of columns per a
>>> node, it should be able to decide how many columns a particular tenant can
>>> have.  It allows selecting a suitable Cassandra cluster for a tenant based
>>> on his or her SLA. I believe the capability to configure resource
>>> controlling parameters per keyspace would be important to support a keyspace
>>> per tenant model. Furthermore, In order to maximize the resource sharing
>>> among tenants, a threshold (on a resource) per keyspace should not be a hard
>>> limit. Rather, it should be oscillated between a hard minimum and a maximum.
>>> For example, if a particular tenant needs more resources at a given time, he
>>> or she should be possible to borrow from the others up to the maximum. The
>>> threshold is only considered when a tenant is assigned to a cluster - the
>>> remaining resources of a cluster should be equal or higher than the resource
>>> limit of the tenant. It may need to spread a single keyspace across multiple
>>> clusters; especially when there are no enough resources in a single
>>> cluster.
>>>
>>> I believe that it would be better to have a flexibility to change
>>> seamlessly multi-tenancy implementation models such as the Cassadra ‘as-is’,
>>> the keyspace per tenant model, a keyspace for all tenants, and so on.  Based
>>> on what I have learnt, each model requires adding tenant id (name space) to
>>> a keyspace’s name or cf’s name or raw key, or column’s name.  Would it be
>>> better to have a kind of pluggable handler that can access those resources
>>> prior to doing the actual operation so that the required changes can be
>>> done? May be prior to authorization.
>>>
>>> Thanks,
>>>
>>> Indika
>>>
>>
>>
>

Re: Multi-tenancy, and authentication and authorization

Posted by indika kumara <in...@gmail.com>.

+1   Are there JIRAs for these requirements? I would like to contribute from
my capacity.

As per my understanding, to support some muti-tenant models, it is needed to
qualified keyspaces' names, Cfs' names, etc. with the tenant namespace (or
id). The easiest way to do this would be to modify corresponding constructs
transparently. I tought of a stage (optional and configurable) prior to
authorization. Is there any better solutions? I appreciate the community's
suggestions.

Moreover, It is needed to send the tenant NS(id) with the user credentials
(A users belongs to this tenant (org.)). For that purpose, I thought of
using the user credentials in the AuthenticationRequest. s there any better
solution?

I would like to have a MT support at the Cassandra level which is optional
and configurable.

Thanks,

Indika

On Wed, Jan 19, 2011 at 7:40 PM, David Boxenhorn <da...@lookin2.com> wrote:

> Yes, the way I see it - and it becomes even more necessary for a
> multi-tenant configuration - there should be completely separate
> configurations for applications and for servers.
>
> - Application configuration is based on data and usage characteristics of
> your application.
> - Server configuration is based on the specific hardware limitations of the
> server.
>
> Obviously, server limitations take priority over application configuration.
>
>
> Assuming that each tenant in a multi-tenant environment gets one keyspace,
> you would also want to enforce limitations based on keyspace (which
> correspond to parameters that the tenant payed for).
>
> So now we have three levels:
>
> 1. Server configuration (top priority)
> 2. Keyspace configuration (payed-for service - second priority)
> 3. Column family configuration (configuration provided by tenant - third
> priority)
>
>
> On Wed, Jan 19, 2011 at 3:15 PM, indika kumara <in...@gmail.com>wrote:
>
>> As the actual problem is mostly related to the number of CFs in the system
>> (may be number of the columns), I still believe that supporting exposing the
>> Cassandra ‘as-is’ to a tenant is doable and suitable though need some
>> fixes.  That multi-tenancy model allows a tenant to use the programming
>> model of the Cassandra ‘as-is’, enabling the seamless migration of an
>> application that uses the Cassandra into the cloud. Moreover, In order to
>> support different SLA requirements of different tenants, the configurability
>> of keyspaces, cfs, etc., per tenant may be critical. However, there are
>> trade-offs among usability, memory consumption, and performance. I believe
>> that it is important to consider the SLA requirements of different tenants
>> when deciding the strategies for controlling resource consumption.
>>
>> I like to the idea of system-wide parameters for controlling resource
>> usage. I believe that the tenant-specific parameters are equally important.
>> There are resources, and each tenant can claim a portion of them based on
>> SLA. For instance, if there is a threshold on the number of columns per a
>> node, it should be able to decide how many columns a particular tenant can
>> have.  It allows selecting a suitable Cassandra cluster for a tenant based
>> on his or her SLA. I believe the capability to configure resource
>> controlling parameters per keyspace would be important to support a keyspace
>> per tenant model. Furthermore, In order to maximize the resource sharing
>> among tenants, a threshold (on a resource) per keyspace should not be a hard
>> limit. Rather, it should be oscillated between a hard minimum and a maximum.
>> For example, if a particular tenant needs more resources at a given time, he
>> or she should be possible to borrow from the others up to the maximum. The
>> threshold is only considered when a tenant is assigned to a cluster - the
>> remaining resources of a cluster should be equal or higher than the resource
>> limit of the tenant. It may need to spread a single keyspace across multiple
>> clusters; especially when there are no enough resources in a single
>> cluster.
>>
>> I believe that it would be better to have a flexibility to change
>> seamlessly multi-tenancy implementation models such as the Cassadra ‘as-is’,
>> the keyspace per tenant model, a keyspace for all tenants, and so on.  Based
>> on what I have learnt, each model requires adding tenant id (name space) to
>> a keyspace’s name or cf’s name or raw key, or column’s name.  Would it be
>> better to have a kind of pluggable handler that can access those resources
>> prior to doing the actual operation so that the required changes can be
>> done? May be prior to authorization.
>>
>> Thanks,
>>
>> Indika
>>
>
>

Re: Multi-tenancy, and authentication and authorization

Posted by David Boxenhorn <da...@lookin2.com>.

Yes, the way I see it - and it becomes even more necessary for a
multi-tenant configuration - there should be completely separate
configurations for applications and for servers.

- Application configuration is based on data and usage characteristics of
your application.
- Server configuration is based on the specific hardware limitations of the
server.

Obviously, server limitations take priority over application configuration.

Assuming that each tenant in a multi-tenant environment gets one keyspace,
you would also want to enforce limitations based on keyspace (which
correspond to parameters that the tenant payed for).

So now we have three levels:

1. Server configuration (top priority)
2. Keyspace configuration (payed-for service - second priority)
3. Column family configuration (configuration provided by tenant - third
priority)


On Wed, Jan 19, 2011 at 3:15 PM, indika kumara <in...@gmail.com>wrote:

> As the actual problem is mostly related to the number of CFs in the system
> (may be number of the columns), I still believe that supporting exposing the
> Cassandra ‘as-is’ to a tenant is doable and suitable though need some
> fixes.  That multi-tenancy model allows a tenant to use the programming
> model of the Cassandra ‘as-is’, enabling the seamless migration of an
> application that uses the Cassandra into the cloud. Moreover, In order to
> support different SLA requirements of different tenants, the configurability
> of keyspaces, cfs, etc., per tenant may be critical. However, there are
> trade-offs among usability, memory consumption, and performance. I believe
> that it is important to consider the SLA requirements of different tenants
> when deciding the strategies for controlling resource consumption.
>
> I like to the idea of system-wide parameters for controlling resource
> usage. I believe that the tenant-specific parameters are equally important.
> There are resources, and each tenant can claim a portion of them based on
> SLA. For instance, if there is a threshold on the number of columns per a
> node, it should be able to decide how many columns a particular tenant can
> have.  It allows selecting a suitable Cassandra cluster for a tenant based
> on his or her SLA. I believe the capability to configure resource
> controlling parameters per keyspace would be important to support a keyspace
> per tenant model. Furthermore, In order to maximize the resource sharing
> among tenants, a threshold (on a resource) per keyspace should not be a hard
> limit. Rather, it should be oscillated between a hard minimum and a maximum.
> For example, if a particular tenant needs more resources at a given time, he
> or she should be possible to borrow from the others up to the maximum. The
> threshold is only considered when a tenant is assigned to a cluster - the
> remaining resources of a cluster should be equal or higher than the resource
> limit of the tenant. It may need to spread a single keyspace across multiple
> clusters; especially when there are no enough resources in a single
> cluster.
>
> I believe that it would be better to have a flexibility to change
> seamlessly multi-tenancy implementation models such as the Cassadra ‘as-is’,
> the keyspace per tenant model, a keyspace for all tenants, and so on.  Based
> on what I have learnt, each model requires adding tenant id (name space) to
> a keyspace’s name or cf’s name or raw key, or column’s name.  Would it be
> better to have a kind of pluggable handler that can access those resources
> prior to doing the actual operation so that the required changes can be
> done? May be prior to authorization.
>
> Thanks,
>
> Indika
>

Re: Multi-tenancy, and authentication and authorization

Posted by indika kumara <in...@gmail.com>.

As the actual problem is mostly related to the number of CFs in the system
(may be number of the columns), I still believe that supporting exposing the
Cassandra ‘as-is’ to a tenant is doable and suitable though need some
fixes.  That multi-tenancy model allows a tenant to use the programming
model of the Cassandra ‘as-is’, enabling the seamless migration of an
application that uses the Cassandra into the cloud. Moreover, In order to
support different SLA requirements of different tenants, the configurability
of keyspaces, cfs, etc., per tenant may be critical. However, there are
trade-offs among usability, memory consumption, and performance. I believe
that it is important to consider the SLA requirements of different tenants
when deciding the strategies for controlling resource consumption.

I like to the idea of system-wide parameters for controlling resource usage.
I believe that the tenant-specific parameters are equally important. There
are resources, and each tenant can claim a portion of them based on SLA. For
instance, if there is a threshold on the number of columns per a node, it
should be able to decide how many columns a particular tenant can have.  It
allows selecting a suitable Cassandra cluster for a tenant based on his or
her SLA. I believe the capability to configure resource controlling
parameters per keyspace would be important to support a keyspace per tenant
model. Furthermore, In order to maximize the resource sharing among tenants,
a threshold (on a resource) per keyspace should not be a hard limit. Rather,
it should be oscillated between a hard minimum and a maximum. For example,
if a particular tenant needs more resources at a given time, he or she
should be possible to borrow from the others up to the maximum. The
threshold is only considered when a tenant is assigned to a cluster - the
remaining resources of a cluster should be equal or higher than the resource
limit of the tenant. It may need to spread a single keyspace across multiple
clusters; especially when there are no enough resources in a single
cluster.

I believe that it would be better to have a flexibility to change seamlessly
multi-tenancy implementation models such as the Cassadra ‘as-is’, the
keyspace per tenant model, a keyspace for all tenants, and so on.  Based on
what I have learnt, each model requires adding tenant id (name space) to a
keyspace’s name or cf’s name or raw key, or column’s name.  Would it be
better to have a kind of pluggable handler that can access those resources
prior to doing the actual operation so that the required changes can be
done? May be prior to authorization.

Thanks,

Indika

Re: Multi-tenancy, and authentication and authorization

Posted by Aaron Morton <aa...@thelastpickle.com>.

As everyone says, it's not issues with the Keyspace directly as they are just a container. It's the CF's in the keyspace, but let's just say keyspace cause it's easier.

As things stand, if you allow point and click creation for keyspaces you will hand over control of the memory requirements to the users. This will be a bad thing. E.g. Lots of cf's will get created and you will run out of memory, or cf's will get created with huge Memtable settings and you will run out of memory, or caches will get set huge and you get the picture. One badly behaving keyspace or column family can take down a node / cluster.

IMHO currently the best way to share a Cassandra cluster is through some sort of application layer that uses as static keyspace. Others have a better understanding of the internals and may have ideas about how this could change in the future.

Aaron

On 19/01/2011, at 9:07 AM, Ed Anuff <ed...@anuff.com> wrote:

> Hi Jeremy, thanks, I was really coming at it from the question of whether keyspaces were a functional basis for multitenancy in Cassandra.  I think the MT issues discussed on the wiki page are the , but I'd like to get a better understanding of the core issue of keyspaces and then try to get that onto the page as maybe the first section.
> 
> Ed
> 
> On Tue, Jan 18, 2011 at 11:42 AM, Jeremy Hanna <je...@gmail.com> wrote:
> Feel free to use that wiki page or another wiki page to collaborate on more pressing multi tenant issues.  The wiki is editable by all.  The MultiTenant page was meant as a launching point for tracking progress on things we could think of wrt MT.
> 
> Obviously the memtable problem is the largest concern at this point.  If you have any ideas wrt that and want to collaborate on how to address that, perhaps even in a way that would get accepted in core cassandra, feel free to propose solutions in a jira ticket or on the list.
> 
> A caveat to getting things into core cassandra - make sure anything you do is considerate of single-tenant cassandra.  If possible, make things pluggable and optional.  The round robin request scheduler is an example.  The functionality is there but you have to enable it.  If it can't be made pluggable/optional, you can get good feedback from the community about proposed solutions in core Cassandra (like for the memtable issue in particular).
> 
> Anyway, just wanted to chime in with 2 cents about that page (since I created it and was helping maintain it before getting pulled off onto other projects).
> 
> On Jan 18, 2011, at 1:12 PM, Ed Anuff wrote:
> 
> > Hi Indika, I've done a lot of work using the keyspace per tenant model, and I'm seeing big problems with the memory consumption, even though it's certainly the most clean way to implement it.  Luckily, before I used the keyspace per tenant approach, I'd implemented my system using a single keyspace approach and can still revert back to that.  The rest of the stuff for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue is a big concern at the moment.
> >
> > Ed
> >
> > On Tue, Jan 18, 2011 at 9:40 AM, indika kumara <in...@gmail.com> wrote:
> > Hi Aaron,
> >
> > I read some articles about the Cassandra, and now understand a little bit about trade-offs.
> >
> > I feel the goal should be to optimize memory as well as performance. I have to consider the number of column families, the columns per a family, the number of rows, the memtable’s threshold, and so on. I also have to consider how to maximize resource sharing among tenants. However, I feel that a keyspace should be able to be configured based on the tenant’s class (e.g replication factor). As per some resources, I feel that the issue is not in the number of keyspaces, but with the number of CF, the number of the rows in a CF, the numbers of columns, the size of the data in a column, and so on. Am I correct? I appreciate your opinion.
> >
> > What would be the suitable approach? A keyspace per tenant (there would be a limit on the tenants per a Cassandra cluster) or a keyspace for all tenant.
> >
> > I still would love to expose the Cassandra ‘as-is’ to a tenant virtually yet with acceptable memory consumption and performance.
> >
> > Thanks,
> >
> > Indika
> >
> >
> 
>

Re: Multi-tenancy, and authentication and authorization

Posted by Ed Anuff <ed...@anuff.com>.

Hi Jeremy, thanks, I was really coming at it from the question of whether
keyspaces were a functional basis for multitenancy in Cassandra.  I think
the MT issues discussed on the wiki page are the , but I'd like to get a
better understanding of the core issue of keyspaces and then try to get that
onto the page as maybe the first section.

Ed

On Tue, Jan 18, 2011 at 11:42 AM, Jeremy Hanna
<je...@gmail.com>wrote:

> Feel free to use that wiki page or another wiki page to collaborate on more
> pressing multi tenant issues.  The wiki is editable by all.  The MultiTenant
> page was meant as a launching point for tracking progress on things we could
> think of wrt MT.
>
> Obviously the memtable problem is the largest concern at this point.  If
> you have any ideas wrt that and want to collaborate on how to address that,
> perhaps even in a way that would get accepted in core cassandra, feel free
> to propose solutions in a jira ticket or on the list.
>
> A caveat to getting things into core cassandra - make sure anything you do
> is considerate of single-tenant cassandra.  If possible, make things
> pluggable and optional.  The round robin request scheduler is an example.
>  The functionality is there but you have to enable it.  If it can't be made
> pluggable/optional, you can get good feedback from the community about
> proposed solutions in core Cassandra (like for the memtable issue in
> particular).
>
> Anyway, just wanted to chime in with 2 cents about that page (since I
> created it and was helping maintain it before getting pulled off onto other
> projects).
>
> On Jan 18, 2011, at 1:12 PM, Ed Anuff wrote:
>
> > Hi Indika, I've done a lot of work using the keyspace per tenant model,
> and I'm seeing big problems with the memory consumption, even though it's
> certainly the most clean way to implement it.  Luckily, before I used the
> keyspace per tenant approach, I'd implemented my system using a single
> keyspace approach and can still revert back to that.  The rest of the stuff
> for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue
> is a big concern at the moment.
> >
> > Ed
> >
> > On Tue, Jan 18, 2011 at 9:40 AM, indika kumara <in...@gmail.com>
> wrote:
> > Hi Aaron,
> >
> > I read some articles about the Cassandra, and now understand a little bit
> about trade-offs.
> >
> > I feel the goal should be to optimize memory as well as performance. I
> have to consider the number of column families, the columns per a family,
> the number of rows, the memtable’s threshold, and so on. I also have to
> consider how to maximize resource sharing among tenants. However, I feel
> that a keyspace should be able to be configured based on the tenant’s class
> (e.g replication factor). As per some resources, I feel that the issue is
> not in the number of keyspaces, but with the number of CF, the number of the
> rows in a CF, the numbers of columns, the size of the data in a column, and
> so on. Am I correct? I appreciate your opinion.
> >
> > What would be the suitable approach? A keyspace per tenant (there would
> be a limit on the tenants per a Cassandra cluster) or a keyspace for all
> tenant.
> >
> > I still would love to expose the Cassandra ‘as-is’ to a tenant virtually
> yet with acceptable memory consumption and performance.
> >
> > Thanks,
> >
> > Indika
> >
> >
>
>

Re: Multi-tenancy, and authentication and authorization

Posted by Jeremy Hanna <je...@gmail.com>.

Feel free to use that wiki page or another wiki page to collaborate on more pressing multi tenant issues.  The wiki is editable by all.  The MultiTenant page was meant as a launching point for tracking progress on things we could think of wrt MT.

Obviously the memtable problem is the largest concern at this point.  If you have any ideas wrt that and want to collaborate on how to address that, perhaps even in a way that would get accepted in core cassandra, feel free to propose solutions in a jira ticket or on the list.

A caveat to getting things into core cassandra - make sure anything you do is considerate of single-tenant cassandra.  If possible, make things pluggable and optional.  The round robin request scheduler is an example.  The functionality is there but you have to enable it.  If it can't be made pluggable/optional, you can get good feedback from the community about proposed solutions in core Cassandra (like for the memtable issue in particular).

Anyway, just wanted to chime in with 2 cents about that page (since I created it and was helping maintain it before getting pulled off onto other projects).

On Jan 18, 2011, at 1:12 PM, Ed Anuff wrote:

> Hi Indika, I've done a lot of work using the keyspace per tenant model, and I'm seeing big problems with the memory consumption, even though it's certainly the most clean way to implement it.  Luckily, before I used the keyspace per tenant approach, I'd implemented my system using a single keyspace approach and can still revert back to that.  The rest of the stuff for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue is a big concern at the moment.
> 
> Ed
> 
> On Tue, Jan 18, 2011 at 9:40 AM, indika kumara <in...@gmail.com> wrote:
> Hi Aaron,
> 
> I read some articles about the Cassandra, and now understand a little bit about trade-offs.
> 
> I feel the goal should be to optimize memory as well as performance. I have to consider the number of column families, the columns per a family, the number of rows, the memtable’s threshold, and so on. I also have to consider how to maximize resource sharing among tenants. However, I feel that a keyspace should be able to be configured based on the tenant’s class (e.g replication factor). As per some resources, I feel that the issue is not in the number of keyspaces, but with the number of CF, the number of the rows in a CF, the numbers of columns, the size of the data in a column, and so on. Am I correct? I appreciate your opinion. 
> 
> What would be the suitable approach? A keyspace per tenant (there would be a limit on the tenants per a Cassandra cluster) or a keyspace for all tenant.
> 
> I still would love to expose the Cassandra ‘as-is’ to a tenant virtually yet with acceptable memory consumption and performance.
> 
> Thanks,
> 
> Indika
> 
>

Re: Multi-tenancy, and authentication and authorization

Posted by Ed Anuff <ed...@anuff.com>.

Hi Indika, I've done a lot of work using the keyspace per tenant model, and
I'm seeing big problems with the memory consumption, even though it's
certainly the most clean way to implement it.  Luckily, before I used the
keyspace per tenant approach, I'd implemented my system using a single
keyspace approach and can still revert back to that.  The rest of the stuff
for multi-tenancy on the wiki is largely irrelevant, but the keyspace issue
is a big concern at the moment.

Ed

On Tue, Jan 18, 2011 at 9:40 AM, indika kumara <in...@gmail.com>wrote:

> Hi Aaron,
>
> I read some articles about the Cassandra, and now understand a little bit
> about trade-offs.
>
> I feel the goal should be to optimize memory as well as performance. I have
> to consider the number of column families, the columns per a family, the
> number of rows, the memtable’s threshold, and so on. I also have to consider
> how to maximize resource sharing among tenants. However, I feel that a
> keyspace should be able to be configured based on the tenant’s class (e.g
> replication factor). As per some resources, I feel that the issue is not
> in the number of keyspaces, but with the number of CF, the number of the
> rows in a CF, the numbers of columns, the size of the data in a column, and
> so on. Am I correct? I appreciate your opinion.
>
> What would be the suitable approach? A keyspace per tenant (there would be
> a limit on the tenants per a Cassandra cluster) or a keyspace for all
> tenant.
>
> I still would love to expose the Cassandra ‘as-is’ to a tenant virtually
> yet with acceptable memory consumption and performance.
>
> Thanks,
>
> Indika
>
>

Re: Multi-tenancy, and authentication and authorization

Posted by indika kumara <in...@gmail.com>.

Hi Aaron,

I read some articles about the Cassandra, and now understand a little bit
about trade-offs.

I feel the goal should be to optimize memory as well as performance. I have
to consider the number of column families, the columns per a family, the
number of rows, the memtable’s threshold, and so on. I also have to consider
how to maximize resource sharing among tenants. However, I feel that a
keyspace should be able to be configured based on the tenant’s class (e.g
replication factor). As per some resources, I feel that the issue is not in
the number of keyspaces, but with the number of CF, the number of the rows
in a CF, the numbers of columns, the size of the data in a column, and so
on. Am I correct? I appreciate your opinion.

What would be the suitable approach? A keyspace per tenant (there would be a
limit on the tenants per a Cassandra cluster) or a keyspace for all tenant.

I still would love to expose the Cassandra ‘as-is’ to a tenant virtually yet
with acceptable memory consumption and performance.

Thanks,

Indika

Re: Multi-tenancy, and authentication and authorization

Posted by indika kumara <in...@gmail.com>.

Hi Aaron,

I appreciate your help. I am a newbie to Cassandra - just began to study the
code-base.

Do you suggest the following approach?

*1) No changes are in either keyspace names or column family names but the
row-key would be ‘the actual row key’ + 'tenant ID'. It is needed to keep
separate mappings for keyspace vs tenants and column family vs tenants (can
be a form of authorization).*

2) *keep a keyspace per tenant yet expose virtually as many keyspaces.*

3)* A single keyspace for all tenant *

What do you mean by 'use namespaces in the keys'?  Can a key be an QName?

Thanks,

Indika


On Tue, Jan 18, 2011 at 5:26 PM, indika kumara <in...@gmail.com>wrote:

> Moving to user list
>
>
> On Tue, Jan 18, 2011 at 4:05 PM, Aaron Morton <aa...@thelastpickle.com>wrote:
>
>> Have a read about JVM heap sizing here
>> http://wiki.apache.org/cassandra/MemtableThresholds
>>
>> If you let people create keyspaces with a mouse click you will soon run
>> out of memory.
>>
>> I use Cassandra to provide a self service "storage service" at my
>> organization. All virtual databases operate in the same Cassandra keyspace
>> (which does not change), and I use namespaces in the keys to separate
>> things. Take a look at how amazon S3 works, it may give you some ideas.
>>
>> If you want to continue to discussion let's move this to the user list.
>>
>> A
>>
>>
>> On 17/01/2011, at 7:44 PM, indika kumara <in...@gmail.com> wrote:
>>
>> > Hi Stu,
>> >
>> > In our app,  we would like to offer cassandra 'as-is' to tenants. It
>> that
>> > case, each tenant should be able to create Keyspaces as needed. Based on
>> the
>> > authorization, I expect to implement it. In my view, the implementation
>> > options are as follows.
>> >
>> > 1) The name of a keyspace would be 'the actual keyspace name' + 'tenant
>> ID'
>> >
>> > 2) The name of a keyspace would not be changed, but the name of a column
>> > family would be the 'the actual column family name' + 'tenant ID'.  It
>> is
>> > needed to keep a separate mapping for keyspace vs tenants.
>> >
>> > 3) The name of a keypace or a column family would not be changed, but
>> the
>> > name of a column would be 'the actual column name' + 'tenant ID'. It is
>> > needed to keep separate mappings for keyspace vs tenants and column
>> family
>> > vs tenants
>> >
>> > Could you please give your opinions on the above three options?  if
>> there
>> > are any issue regarding above approaches and if those issues can be
>> solved,
>> > I would love to contribute on that.
>> >
>> > Thanks,
>> >
>> > Indika
>> >
>> >
>> > On Fri, Jan 7, 2011 at 11:22 AM, Stu Hood <st...@gmail.com> wrote:
>> >
>> >>> (1) has the problem of multiple memtables (a large amount just isn't
>> >> viable
>> >> There are some very straightforward solutions to this particular
>> problem: I
>> >> wouldn't rule out running with a very large number of
>> >> keyspace/columnfamilies given some minor changes.
>> >>
>> >> As Brandon said, some of the folks that were working on multi-tenancy
>> for
>> >> Cassandra are no longer focused on it. But the code that was generated
>> >> during our efforts is very much available, and is unlikely to have gone
>> >> stale. Would love to talk about this with you.
>> >>
>> >> Thanks,
>> >> Stu
>> >>
>> >> On Thu, Jan 6, 2011 at 8:08 PM, indika kumara <in...@gmail.com>
>> >> wrote:
>> >>
>> >>> Thank you very much Brandon!
>> >>>
>> >>> On Fri, Jan 7, 2011 at 12:40 AM, Brandon Williams <dr...@gmail.com>
>> >>> wrote:
>> >>>
>> >>>> On Thu, Jan 6, 2011 at 12:33 PM, indika kumara <
>> indika.kuma@gmail.com
>> >>>>> wrote:
>> >>>>
>> >>>>> Hi Brandon,
>> >>>>>
>> >>>>> I would like you feedback on my two ideas for implementing mufti
>> >>> tenancy
>> >>>>> with the existing implementation.  Would those be possible to
>> >>> implement?
>> >>>>>
>> >>>>> Thanks,
>> >>>>>
>> >>>>> Indika
>> >>>>>
>> >>>>>>>>>> Two vague ideas: (1) qualified keyspaces (by the tenet domain)
>> >>> (2)
>> >>>>> multiple Cassandra storage configurations in a single node (one per
>> >>>>> tenant).
>> >>>>> For both options, the resource hierarchy would be /cassandra/
>> >>>>> <cluster_name>/<tenant name (domain)>/keyspaces/<ks_name>/
>> >>>>>
>> >>>>
>> >>>> (1) has the problem of multiple memtables (a large amount just isn't
>> >>> viable
>> >>>> right now.)  (2) more or less has the same problem, but in JVM
>> >> instances.
>> >>>>
>> >>>> I would suggest a) not trying to offer cassandra itself, and instead
>> >>> build
>> >>>> a
>> >>>> service that uses cassandra under the hood, and b) splitting up
>> tenants
>> >>> in
>> >>>> this layer.
>> >>>>
>> >>>> -Brandon
>> >>>>
>> >>>
>> >>
>>
>
>