You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by jason zhao yang <zh...@gmail.com> on 2016/08/06 04:33:43 UTC

Re: Support Multi-Tenant in Cassandra

We consider splitting by Keypspace or tables before, but Cassandra's table
is a costly structure(more cpu, flush, memory..).

In our use case, it's expected to have more than 50 tenants on same cluster.

> As it was already mentioned in the ticket itself, filtering is a highly inefficient
operation.
I totally agree, but it's to good to have data filtered on server sider,
rather than client side..

How about adding a logical tenant concept in Cassandra?  all logical
tenants will share the same table schemas, but queries/storage are
separated?


Oleksandr Petrov <ol...@gmail.com>于2016年7月15日周五 下午4:28写道:

> There's a ticket on filtering (#11031), although I would not count on
> filtering in production.
>
> As it was already mentioned in the ticket itself, filtering is a highly
> inefficient operation. it was thought as aid for people who're exploring
> data and/or can structure query in such a way that it will at least be
> local (for example, with IN or EQ query on the partition key and filtering
> out results from the small partition). However, filtering on the Partition
> Key assumes that _every_ replica has to be queried for the results, as we
> do not know which partitions are going to be holding the data. Having every
> query in your system to rely on filtering, big amount of data and high load
> will eventually have substantial negative impact on performance.
>
> I'm not sure what's the amount of tenants you're working with, although
> I've seen setups where tenancy was solved by using multiple keyspaces,
> which helps to completely isolate the data, avoid filtering. Given that
> you've tried splitting sstables on tenant_id, that might be solved by using
> multiple keyspaces. This will also help with server resource isolation and
> most of the issues you've raised.
>
>
> On Fri, Jul 15, 2016 at 10:10 AM Romain Hardouin
> <ro...@yahoo.fr.invalid> wrote:
>
> > I don't use C* in such a context but out of curiosity did you set
> > the request_scheduler to RoundRobin or did you implement your own
> scheduler?
> > Romain
> >     Le Vendredi 15 juillet 2016 8h39, jason zhao yang <
> > zhaoyangsingapore@gmail.com> a écrit :
> >
> >
> >  Hi,
> >
> > May I ask is there any plan of extending functionalities related to
> > Multi-Tenant?
> >
> > Our current approach is to define an extra PartitionKey called
> "tenant_id".
> > In my use cases, all tenants will have the same table schemas.
> >
> > * For security isolation: we customized GRANT statement to be able to
> > restrict user query based on the "tenant_id" partition.
> >
> > * For getting all data of single tenant, we customized SELECT statement
> to
> > support allow filtering on "tenant_id" partition key.
> >
> > * For server resource isolation, I have no idea how to.
> >
> > * For per-tenant backup restore, I was trying a
> > tenant_base_compaction_strategy to split sstables based on tenant_id. it
> > turned out to be very inefficient.
> >
> > What's community's opinion about submitting those patches to Cassandra?
> It
> > will be great if you guys can share the ideal Multi-Tenant architecture
> for
> > Cassandra?
> >
> > jasonstack
> >
> >
> >
>
> --
> Alex Petrov
>

Re: Support Multi-Tenant in Cassandra

Posted by jason zhao yang <zh...@gmail.com>.
Hi Romain,

Thanks for the reply.

> request_scheduler

it is a legacy feature which only works for thrift api..

It will be great to have some sort of scheduling per user/role, but
scheduling on the request will only provide limit isolation..if JVM crashes
due to one tenant's invalid request(eg. insert a blo to collection column),
it will be awful.


Thank you.

jason zhao yang <zh...@gmail.com>于2016年8月6日周六 下午12:33写道:

> We consider splitting by Keypspace or tables before, but Cassandra's table
> is a costly structure(more cpu, flush, memory..).
>
> In our use case, it's expected to have more than 50 tenants on same
> cluster.
>
> > As it was already mentioned in the ticket itself, filtering is a highly inefficient
> operation.
> I totally agree, but it's to good to have data filtered on server sider,
> rather than client side..
>
> How about adding a logical tenant concept in Cassandra?  all logical
> tenants will share the same table schemas, but queries/storage are
> separated?
>
>
> Oleksandr Petrov <ol...@gmail.com>于2016年7月15日周五 下午4:28写道:
>
>> There's a ticket on filtering (#11031), although I would not count on
>> filtering in production.
>>
>> As it was already mentioned in the ticket itself, filtering is a highly
>> inefficient operation. it was thought as aid for people who're exploring
>> data and/or can structure query in such a way that it will at least be
>> local (for example, with IN or EQ query on the partition key and filtering
>> out results from the small partition). However, filtering on the Partition
>> Key assumes that _every_ replica has to be queried for the results, as we
>> do not know which partitions are going to be holding the data. Having
>> every
>> query in your system to rely on filtering, big amount of data and high
>> load
>> will eventually have substantial negative impact on performance.
>>
>> I'm not sure what's the amount of tenants you're working with, although
>> I've seen setups where tenancy was solved by using multiple keyspaces,
>> which helps to completely isolate the data, avoid filtering. Given that
>> you've tried splitting sstables on tenant_id, that might be solved by
>> using
>> multiple keyspaces. This will also help with server resource isolation and
>> most of the issues you've raised.
>>
>>
>> On Fri, Jul 15, 2016 at 10:10 AM Romain Hardouin
>> <ro...@yahoo.fr.invalid> wrote:
>>
>> > I don't use C* in such a context but out of curiosity did you set
>> > the request_scheduler to RoundRobin or did you implement your own
>> scheduler?
>> > Romain
>> >     Le Vendredi 15 juillet 2016 8h39, jason zhao yang <
>> > zhaoyangsingapore@gmail.com> a écrit :
>> >
>> >
>> >  Hi,
>> >
>> > May I ask is there any plan of extending functionalities related to
>> > Multi-Tenant?
>> >
>> > Our current approach is to define an extra PartitionKey called
>> "tenant_id".
>> > In my use cases, all tenants will have the same table schemas.
>> >
>> > * For security isolation: we customized GRANT statement to be able to
>> > restrict user query based on the "tenant_id" partition.
>> >
>> > * For getting all data of single tenant, we customized SELECT statement
>> to
>> > support allow filtering on "tenant_id" partition key.
>> >
>> > * For server resource isolation, I have no idea how to.
>> >
>> > * For per-tenant backup restore, I was trying a
>> > tenant_base_compaction_strategy to split sstables based on tenant_id. it
>> > turned out to be very inefficient.
>> >
>> > What's community's opinion about submitting those patches to Cassandra?
>> It
>> > will be great if you guys can share the ideal Multi-Tenant architecture
>> for
>> > Cassandra?
>> >
>> > jasonstack
>> >
>> >
>> >
>>
>> --
>> Alex Petrov
>>
>