You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Martin Meyer <el...@gmail.com> on 2014/03/11 15:17:00 UTC

How expensive are additional keyspaces?

Hey all -

My company is working on introducing a configuration service system to
provide cofig data to several of our applications, to be backed by
Cassandra. We're already using Cassandra for other services, and at
the moment our pending design just puts all the new tables (9 of them,
I believe) in one of our pre-existing keyspaces.

I've got a few questions about keyspaces that I'm hoping for input on.
Some Google hunting didn't turn up obvious answers, at least not for
recent versions of Cassandra.

1) What trade offs are being made by using a new keyspace versus
re-purposing an existing one (that is in active use by another
application)? Organization is the obvious answer, I'm looking for any
technical reasons.

2) Is there any per-keyspace overhead incurred by the cluster?

3) Does it impact on-disk layout at all for tables to be in a
different keyspace from others? Is any sort of file fragmentation
potentially introduced just by doing this in a new keyspace as opposed
to an exiting one?

4) Does it add any metadata overhead to the system keyspace?

5) Why might we *not* want to make a separate keyspace for this?

6) Does anyone have experience with creating additional keyspaces to
the point that Cassandra can no longer handle it? Note that we're
*not* planning to do this, I'm just curious.

Cheers,
Martin

Re: How expensive are additional keyspaces?

Posted by Edward Capriolo <ed...@gmail.com>.
So in the 0.6.X days a signature of a get looked something like this:

get(String keyspace, ColumnPath cp, String rowkey)

Besides changes form string -> ByteBuffer the keyspace was pulled out of
the argument.

I think the better more flexible way to do this would be:

struct GetRequest {
   1: optional keyspace,
   2: required rowkey
   3: optional columnPath
}

get(GetRequest g)

This would put some burden on clients to make builder objects instead of
calling methods, but it would make something easier to evolve I think.

However it is hard for me to justify making a second copy of each method
for this small use case. Otherwise I would take that up.




On Tue, Mar 11, 2014 at 12:07 PM, Peter Lin <wo...@gmail.com> wrote:

>
> if I have time this summer, I may work on that, since I like having thrift.
>
>
> On Tue, Mar 11, 2014 at 12:05 PM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> This mistake is not a thrift limitation. In 0.6.X you could switch
>> keyspaces without calling setKeyspace(String) methods specified the
>> keyspace in every operation. This is mirrors the StorageProxy class. In
>> 0.7.X setKeyspace() was created and the keyspace was removed from all these
>> thrift methods. I really dislike that change personally :)
>>
>> If someone was so motivated, they could pretty easily (a couple days
>> work) add new methods to thrift that do not have this limitation.
>>
>>
>>
>>
>> On Tue, Mar 11, 2014 at 11:39 AM, Jonathan Ellis <jb...@gmail.com>wrote:
>>
>>> That is correct.  Another place where the mistakes of Thrift informed
>>> our development of the native protocol.
>>>
>>> On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright <kw...@nanigans.com>
>>> wrote:
>>> > Does this whole true for the native protocol?  I've noticed that you
>>> can
>>> > create a session object in the datastax driver without specifying a
>>> keyspace
>>> > and so long as you include the keyspace in all queries instead of just
>>> table
>>> > name, it works fine.  In that case, I assume there's only one
>>> connection
>>> > pool for all keyspaces.
>>> >
>>> > From: Edward Capriolo <ed...@gmail.com>
>>> > Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> > Date: Tuesday, March 11, 2014 at 11:05 AM
>>> > To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> > Subject: Re: How expensive are additional keyspaces?
>>> >
>>> > The biggest expense of them is that you need to be authenticated to a
>>> > keyspace to perform and operation. Thus connection pools are bound to
>>> > keyspaces. Switching a keyspace is an RPC operation. In the thrift
>>> client,
>>> > If you have 100 keyspaces you need 100 connection pools that starts to
>>> be a
>>> > pain very quickly.
>>> >
>>> > I suggest keeping everything in one keyspace unless you really need
>>> > different replication factors and or network replication settings per
>>> > keyspace.
>>> >
>>> >
>>> > On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <el...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hey all -
>>> >>
>>> >> My company is working on introducing a configuration service system to
>>> >> provide cofig data to several of our applications, to be backed by
>>> >> Cassandra. We're already using Cassandra for other services, and at
>>> >> the moment our pending design just puts all the new tables (9 of them,
>>> >> I believe) in one of our pre-existing keyspaces.
>>> >>
>>> >> I've got a few questions about keyspaces that I'm hoping for input on.
>>> >> Some Google hunting didn't turn up obvious answers, at least not for
>>> >> recent versions of Cassandra.
>>> >>
>>> >> 1) What trade offs are being made by using a new keyspace versus
>>> >> re-purposing an existing one (that is in active use by another
>>> >> application)? Organization is the obvious answer, I'm looking for any
>>> >> technical reasons.
>>> >>
>>> >> 2) Is there any per-keyspace overhead incurred by the cluster?
>>> >>
>>> >> 3) Does it impact on-disk layout at all for tables to be in a
>>> >> different keyspace from others? Is any sort of file fragmentation
>>> >> potentially introduced just by doing this in a new keyspace as opposed
>>> >> to an exiting one?
>>> >>
>>> >> 4) Does it add any metadata overhead to the system keyspace?
>>> >>
>>> >> 5) Why might we *not* want to make a separate keyspace for this?
>>> >>
>>> >> 6) Does anyone have experience with creating additional keyspaces to
>>> >> the point that Cassandra can no longer handle it? Note that we're
>>> >> *not* planning to do this, I'm just curious.
>>> >>
>>> >> Cheers,
>>> >> Martin
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Jonathan Ellis
>>> Project Chair, Apache Cassandra
>>> co-founder, http://www.datastax.com
>>> @spyced
>>>
>>
>>
>

Re: How expensive are additional keyspaces?

Posted by Peter Lin <wo...@gmail.com>.
if I have time this summer, I may work on that, since I like having thrift.


On Tue, Mar 11, 2014 at 12:05 PM, Edward Capriolo <ed...@gmail.com>wrote:

> This mistake is not a thrift limitation. In 0.6.X you could switch
> keyspaces without calling setKeyspace(String) methods specified the
> keyspace in every operation. This is mirrors the StorageProxy class. In
> 0.7.X setKeyspace() was created and the keyspace was removed from all these
> thrift methods. I really dislike that change personally :)
>
> If someone was so motivated, they could pretty easily (a couple days work)
> add new methods to thrift that do not have this limitation.
>
>
>
>
> On Tue, Mar 11, 2014 at 11:39 AM, Jonathan Ellis <jb...@gmail.com>wrote:
>
>> That is correct.  Another place where the mistakes of Thrift informed
>> our development of the native protocol.
>>
>> On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright <kw...@nanigans.com>
>> wrote:
>> > Does this whole true for the native protocol?  I've noticed that you can
>> > create a session object in the datastax driver without specifying a
>> keyspace
>> > and so long as you include the keyspace in all queries instead of just
>> table
>> > name, it works fine.  In that case, I assume there's only one connection
>> > pool for all keyspaces.
>> >
>> > From: Edward Capriolo <ed...@gmail.com>
>> > Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> > Date: Tuesday, March 11, 2014 at 11:05 AM
>> > To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> > Subject: Re: How expensive are additional keyspaces?
>> >
>> > The biggest expense of them is that you need to be authenticated to a
>> > keyspace to perform and operation. Thus connection pools are bound to
>> > keyspaces. Switching a keyspace is an RPC operation. In the thrift
>> client,
>> > If you have 100 keyspaces you need 100 connection pools that starts to
>> be a
>> > pain very quickly.
>> >
>> > I suggest keeping everything in one keyspace unless you really need
>> > different replication factors and or network replication settings per
>> > keyspace.
>> >
>> >
>> > On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <el...@gmail.com>
>> > wrote:
>> >>
>> >> Hey all -
>> >>
>> >> My company is working on introducing a configuration service system to
>> >> provide cofig data to several of our applications, to be backed by
>> >> Cassandra. We're already using Cassandra for other services, and at
>> >> the moment our pending design just puts all the new tables (9 of them,
>> >> I believe) in one of our pre-existing keyspaces.
>> >>
>> >> I've got a few questions about keyspaces that I'm hoping for input on.
>> >> Some Google hunting didn't turn up obvious answers, at least not for
>> >> recent versions of Cassandra.
>> >>
>> >> 1) What trade offs are being made by using a new keyspace versus
>> >> re-purposing an existing one (that is in active use by another
>> >> application)? Organization is the obvious answer, I'm looking for any
>> >> technical reasons.
>> >>
>> >> 2) Is there any per-keyspace overhead incurred by the cluster?
>> >>
>> >> 3) Does it impact on-disk layout at all for tables to be in a
>> >> different keyspace from others? Is any sort of file fragmentation
>> >> potentially introduced just by doing this in a new keyspace as opposed
>> >> to an exiting one?
>> >>
>> >> 4) Does it add any metadata overhead to the system keyspace?
>> >>
>> >> 5) Why might we *not* want to make a separate keyspace for this?
>> >>
>> >> 6) Does anyone have experience with creating additional keyspaces to
>> >> the point that Cassandra can no longer handle it? Note that we're
>> >> *not* planning to do this, I'm just curious.
>> >>
>> >> Cheers,
>> >> Martin
>> >
>> >
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder, http://www.datastax.com
>> @spyced
>>
>
>

Re: How expensive are additional keyspaces?

Posted by Edward Capriolo <ed...@gmail.com>.
This mistake is not a thrift limitation. In 0.6.X you could switch
keyspaces without calling setKeyspace(String) methods specified the
keyspace in every operation. This is mirrors the StorageProxy class. In
0.7.X setKeyspace() was created and the keyspace was removed from all these
thrift methods. I really dislike that change personally :)

If someone was so motivated, they could pretty easily (a couple days work)
add new methods to thrift that do not have this limitation.




On Tue, Mar 11, 2014 at 11:39 AM, Jonathan Ellis <jb...@gmail.com> wrote:

> That is correct.  Another place where the mistakes of Thrift informed
> our development of the native protocol.
>
> On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright <kw...@nanigans.com>
> wrote:
> > Does this whole true for the native protocol?  I've noticed that you can
> > create a session object in the datastax driver without specifying a
> keyspace
> > and so long as you include the keyspace in all queries instead of just
> table
> > name, it works fine.  In that case, I assume there's only one connection
> > pool for all keyspaces.
> >
> > From: Edward Capriolo <ed...@gmail.com>
> > Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> > Date: Tuesday, March 11, 2014 at 11:05 AM
> > To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> > Subject: Re: How expensive are additional keyspaces?
> >
> > The biggest expense of them is that you need to be authenticated to a
> > keyspace to perform and operation. Thus connection pools are bound to
> > keyspaces. Switching a keyspace is an RPC operation. In the thrift
> client,
> > If you have 100 keyspaces you need 100 connection pools that starts to
> be a
> > pain very quickly.
> >
> > I suggest keeping everything in one keyspace unless you really need
> > different replication factors and or network replication settings per
> > keyspace.
> >
> >
> > On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <el...@gmail.com>
> > wrote:
> >>
> >> Hey all -
> >>
> >> My company is working on introducing a configuration service system to
> >> provide cofig data to several of our applications, to be backed by
> >> Cassandra. We're already using Cassandra for other services, and at
> >> the moment our pending design just puts all the new tables (9 of them,
> >> I believe) in one of our pre-existing keyspaces.
> >>
> >> I've got a few questions about keyspaces that I'm hoping for input on.
> >> Some Google hunting didn't turn up obvious answers, at least not for
> >> recent versions of Cassandra.
> >>
> >> 1) What trade offs are being made by using a new keyspace versus
> >> re-purposing an existing one (that is in active use by another
> >> application)? Organization is the obvious answer, I'm looking for any
> >> technical reasons.
> >>
> >> 2) Is there any per-keyspace overhead incurred by the cluster?
> >>
> >> 3) Does it impact on-disk layout at all for tables to be in a
> >> different keyspace from others? Is any sort of file fragmentation
> >> potentially introduced just by doing this in a new keyspace as opposed
> >> to an exiting one?
> >>
> >> 4) Does it add any metadata overhead to the system keyspace?
> >>
> >> 5) Why might we *not* want to make a separate keyspace for this?
> >>
> >> 6) Does anyone have experience with creating additional keyspaces to
> >> the point that Cassandra can no longer handle it? Note that we're
> >> *not* planning to do this, I'm just curious.
> >>
> >> Cheers,
> >> Martin
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>

Re: How expensive are additional keyspaces?

Posted by Jonathan Ellis <jb...@gmail.com>.
That is correct.  Another place where the mistakes of Thrift informed
our development of the native protocol.

On Tue, Mar 11, 2014 at 10:08 AM, Keith Wright <kw...@nanigans.com> wrote:
> Does this whole true for the native protocol?  I've noticed that you can
> create a session object in the datastax driver without specifying a keyspace
> and so long as you include the keyspace in all queries instead of just table
> name, it works fine.  In that case, I assume there's only one connection
> pool for all keyspaces.
>
> From: Edward Capriolo <ed...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Tuesday, March 11, 2014 at 11:05 AM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: How expensive are additional keyspaces?
>
> The biggest expense of them is that you need to be authenticated to a
> keyspace to perform and operation. Thus connection pools are bound to
> keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
> If you have 100 keyspaces you need 100 connection pools that starts to be a
> pain very quickly.
>
> I suggest keeping everything in one keyspace unless you really need
> different replication factors and or network replication settings per
> keyspace.
>
>
> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <el...@gmail.com>
> wrote:
>>
>> Hey all -
>>
>> My company is working on introducing a configuration service system to
>> provide cofig data to several of our applications, to be backed by
>> Cassandra. We're already using Cassandra for other services, and at
>> the moment our pending design just puts all the new tables (9 of them,
>> I believe) in one of our pre-existing keyspaces.
>>
>> I've got a few questions about keyspaces that I'm hoping for input on.
>> Some Google hunting didn't turn up obvious answers, at least not for
>> recent versions of Cassandra.
>>
>> 1) What trade offs are being made by using a new keyspace versus
>> re-purposing an existing one (that is in active use by another
>> application)? Organization is the obvious answer, I'm looking for any
>> technical reasons.
>>
>> 2) Is there any per-keyspace overhead incurred by the cluster?
>>
>> 3) Does it impact on-disk layout at all for tables to be in a
>> different keyspace from others? Is any sort of file fragmentation
>> potentially introduced just by doing this in a new keyspace as opposed
>> to an exiting one?
>>
>> 4) Does it add any metadata overhead to the system keyspace?
>>
>> 5) Why might we *not* want to make a separate keyspace for this?
>>
>> 6) Does anyone have experience with creating additional keyspaces to
>> the point that Cassandra can no longer handle it? Note that we're
>> *not* planning to do this, I'm just curious.
>>
>> Cheers,
>> Martin
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced

Re: How expensive are additional keyspaces?

Posted by Ganesh Deka <ga...@gmail.com>.
sir,
Can you help me to configure a windows Vista machine in DatStax OpsCenter..?
[image: Inline image 1]


On Tue, Mar 11, 2014 at 9:06 PM, Jeremiah D Jordan <
jeremiah.jordan@gmail.com> wrote:

> The use of more than one keyspace is not uncommon.  Using 100's of them
> is.  That being said, different keyspaces let you specify different
> replication and different authentication.  If you are not going to be doing
> one of those things, then there really is no point to multiple keyspaces.
>  If you do want to do one of those things, then go for it, make multiple
> keyspaces.
>
>
> -Jeremiah
>
> On Mar 11, 2014, at 10:17 AM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
> I am not sure. As stated the only benefit of multiple keyspaces is if you
> need:
>
> 1) different replication per keyspace
> 2) different multiple data center configurations per keyspace
>
> Unless you have one of these cases you do not need to do this. I would
> always tackle this problem at the application level using something like:
>
>
> http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html
>
> Client issues aside, it is not a very common case and I would advice
> against uncommon set ups.
>
>
>
> On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright <kw...@nanigans.com>wrote:
>
>> Does this whole true for the native protocol?  I've noticed that you can
>> create a session object in the datastax driver without specifying a
>> keyspace and so long as you include the keyspace in all queries instead of
>> just table name, it works fine.  In that case, I assume there's only one
>> connection pool for all keyspaces.
>>
>> From: Edward Capriolo <ed...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Date: Tuesday, March 11, 2014 at 11:05 AM
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Re: How expensive are additional keyspaces?
>>
>> The biggest expense of them is that you need to be authenticated to a
>> keyspace to perform and operation. Thus connection pools are bound to
>> keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
>> If you have 100 keyspaces you need 100 connection pools that starts to be a
>> pain very quickly.
>>
>> I suggest keeping everything in one keyspace unless you really need
>> different replication factors and or network replication settings per
>> keyspace.
>>
>>
>> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <el...@gmail.com>wrote:
>>
>>> Hey all -
>>>
>>> My company is working on introducing a configuration service system to
>>> provide cofig data to several of our applications, to be backed by
>>> Cassandra. We're already using Cassandra for other services, and at
>>> the moment our pending design just puts all the new tables (9 of them,
>>> I believe) in one of our pre-existing keyspaces.
>>>
>>> I've got a few questions about keyspaces that I'm hoping for input on.
>>> Some Google hunting didn't turn up obvious answers, at least not for
>>> recent versions of Cassandra.
>>>
>>> 1) What trade offs are being made by using a new keyspace versus
>>> re-purposing an existing one (that is in active use by another
>>> application)? Organization is the obvious answer, I'm looking for any
>>> technical reasons.
>>>
>>> 2) Is there any per-keyspace overhead incurred by the cluster?
>>>
>>> 3) Does it impact on-disk layout at all for tables to be in a
>>> different keyspace from others? Is any sort of file fragmentation
>>> potentially introduced just by doing this in a new keyspace as opposed
>>> to an exiting one?
>>>
>>> 4) Does it add any metadata overhead to the system keyspace?
>>>
>>> 5) Why might we *not* want to make a separate keyspace for this?
>>>
>>> 6) Does anyone have experience with creating additional keyspaces to
>>> the point that Cassandra can no longer handle it? Note that we're
>>> *not* planning to do this, I'm just curious.
>>>
>>> Cheers,
>>> Martin
>>>
>>
>>
>
>

Re: How expensive are additional keyspaces?

Posted by Peter Lin <wo...@gmail.com>.
I couldn't resist responding.

Having done some experiments with lots of keyspaces and purposely created
lots of keyspaces versus 1 keyspace, the only good reasons I see for many
keyspaces

1. each keyspaces needs a different replication factor. Even in this case,
I personally can't justify having hundreds of different replication factor
settings. Beyond replication factor of 4, my bias take is the highest
number would be the number of datacenters and 1 for local workstation
development

2. using keyspaces to logically organize schema to support things like
multi-tenant applications

I'm sure there are other valid reasons, but those are the ones that come to
my mind.


On Tue, Mar 11, 2014 at 11:58 AM, Edward Capriolo <ed...@gmail.com>wrote:

> The mathematical overhead is one thing. I would guess if you tried some
> design with 10,000 keyspaces and then you ran into a bug/performance
> problem the first thing someone would say to you is "WTF do you have that
> many keyspaces" :) Don't let that be you.
>
>
>
> On Tue, Mar 11, 2014 at 11:38 AM, Jeremiah D Jordan <
> jeremiah.jordan@gmail.com> wrote:
>
>> Also, in terms of overhead, server side the overhead is pretty much all
>> at the Column Family (CF)/Table level, so 100 keyspaces, 1 CF each, is the
>> same as 1 keyspace, 100 CF's.
>>
>> -Jeremiah
>>
>> On Mar 11, 2014, at 10:36 AM, Jeremiah D Jordan <
>> jeremiah.jordan@gmail.com> wrote:
>>
>> The use of more than one keyspace is not uncommon.  Using 100's of them
>> is.  That being said, different keyspaces let you specify different
>> replication and different authentication.  If you are not going to be doing
>> one of those things, then there really is no point to multiple keyspaces.
>>  If you do want to do one of those things, then go for it, make multiple
>> keyspaces.
>>
>>
>> -Jeremiah
>>
>> On Mar 11, 2014, at 10:17 AM, Edward Capriolo <ed...@gmail.com>
>> wrote:
>>
>> I am not sure. As stated the only benefit of multiple keyspaces is if you
>> need:
>>
>> 1) different replication per keyspace
>> 2) different multiple data center configurations per keyspace
>>
>> Unless you have one of these cases you do not need to do this. I would
>> always tackle this problem at the application level using something like:
>>
>>
>> http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html
>>
>> Client issues aside, it is not a very common case and I would advice
>> against uncommon set ups.
>>
>>
>>
>> On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright <kw...@nanigans.com>wrote:
>>
>>> Does this whole true for the native protocol?  I've noticed that you can
>>> create a session object in the datastax driver without specifying a
>>> keyspace and so long as you include the keyspace in all queries instead of
>>> just table name, it works fine.  In that case, I assume there's only one
>>> connection pool for all keyspaces.
>>>
>>> From: Edward Capriolo <ed...@gmail.com>
>>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> Date: Tuesday, March 11, 2014 at 11:05 AM
>>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>>> Subject: Re: How expensive are additional keyspaces?
>>>
>>> The biggest expense of them is that you need to be authenticated to a
>>> keyspace to perform and operation. Thus connection pools are bound to
>>> keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
>>> If you have 100 keyspaces you need 100 connection pools that starts to be a
>>> pain very quickly.
>>>
>>> I suggest keeping everything in one keyspace unless you really need
>>> different replication factors and or network replication settings per
>>> keyspace.
>>>
>>>
>>> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <el...@gmail.com>wrote:
>>>
>>>> Hey all -
>>>>
>>>> My company is working on introducing a configuration service system to
>>>> provide cofig data to several of our applications, to be backed by
>>>> Cassandra. We're already using Cassandra for other services, and at
>>>> the moment our pending design just puts all the new tables (9 of them,
>>>> I believe) in one of our pre-existing keyspaces.
>>>>
>>>> I've got a few questions about keyspaces that I'm hoping for input on.
>>>> Some Google hunting didn't turn up obvious answers, at least not for
>>>> recent versions of Cassandra.
>>>>
>>>> 1) What trade offs are being made by using a new keyspace versus
>>>> re-purposing an existing one (that is in active use by another
>>>> application)? Organization is the obvious answer, I'm looking for any
>>>> technical reasons.
>>>>
>>>> 2) Is there any per-keyspace overhead incurred by the cluster?
>>>>
>>>> 3) Does it impact on-disk layout at all for tables to be in a
>>>> different keyspace from others? Is any sort of file fragmentation
>>>> potentially introduced just by doing this in a new keyspace as opposed
>>>> to an exiting one?
>>>>
>>>> 4) Does it add any metadata overhead to the system keyspace?
>>>>
>>>> 5) Why might we *not* want to make a separate keyspace for this?
>>>>
>>>> 6) Does anyone have experience with creating additional keyspaces to
>>>> the point that Cassandra can no longer handle it? Note that we're
>>>> *not* planning to do this, I'm just curious.
>>>>
>>>> Cheers,
>>>> Martin
>>>>
>>>
>>>
>>
>>
>>
>

Re: How expensive are additional keyspaces?

Posted by Edward Capriolo <ed...@gmail.com>.
The mathematical overhead is one thing. I would guess if you tried some
design with 10,000 keyspaces and then you ran into a bug/performance
problem the first thing someone would say to you is "WTF do you have that
many keyspaces" :) Don't let that be you.


On Tue, Mar 11, 2014 at 11:38 AM, Jeremiah D Jordan <
jeremiah.jordan@gmail.com> wrote:

> Also, in terms of overhead, server side the overhead is pretty much all at
> the Column Family (CF)/Table level, so 100 keyspaces, 1 CF each, is the
> same as 1 keyspace, 100 CF's.
>
> -Jeremiah
>
> On Mar 11, 2014, at 10:36 AM, Jeremiah D Jordan <je...@gmail.com>
> wrote:
>
> The use of more than one keyspace is not uncommon.  Using 100's of them
> is.  That being said, different keyspaces let you specify different
> replication and different authentication.  If you are not going to be doing
> one of those things, then there really is no point to multiple keyspaces.
>  If you do want to do one of those things, then go for it, make multiple
> keyspaces.
>
>
> -Jeremiah
>
> On Mar 11, 2014, at 10:17 AM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
> I am not sure. As stated the only benefit of multiple keyspaces is if you
> need:
>
> 1) different replication per keyspace
> 2) different multiple data center configurations per keyspace
>
> Unless you have one of these cases you do not need to do this. I would
> always tackle this problem at the application level using something like:
>
>
> http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html
>
> Client issues aside, it is not a very common case and I would advice
> against uncommon set ups.
>
>
>
> On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright <kw...@nanigans.com>wrote:
>
>> Does this whole true for the native protocol?  I've noticed that you can
>> create a session object in the datastax driver without specifying a
>> keyspace and so long as you include the keyspace in all queries instead of
>> just table name, it works fine.  In that case, I assume there's only one
>> connection pool for all keyspaces.
>>
>> From: Edward Capriolo <ed...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Date: Tuesday, March 11, 2014 at 11:05 AM
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Re: How expensive are additional keyspaces?
>>
>> The biggest expense of them is that you need to be authenticated to a
>> keyspace to perform and operation. Thus connection pools are bound to
>> keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
>> If you have 100 keyspaces you need 100 connection pools that starts to be a
>> pain very quickly.
>>
>> I suggest keeping everything in one keyspace unless you really need
>> different replication factors and or network replication settings per
>> keyspace.
>>
>>
>> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <el...@gmail.com>wrote:
>>
>>> Hey all -
>>>
>>> My company is working on introducing a configuration service system to
>>> provide cofig data to several of our applications, to be backed by
>>> Cassandra. We're already using Cassandra for other services, and at
>>> the moment our pending design just puts all the new tables (9 of them,
>>> I believe) in one of our pre-existing keyspaces.
>>>
>>> I've got a few questions about keyspaces that I'm hoping for input on.
>>> Some Google hunting didn't turn up obvious answers, at least not for
>>> recent versions of Cassandra.
>>>
>>> 1) What trade offs are being made by using a new keyspace versus
>>> re-purposing an existing one (that is in active use by another
>>> application)? Organization is the obvious answer, I'm looking for any
>>> technical reasons.
>>>
>>> 2) Is there any per-keyspace overhead incurred by the cluster?
>>>
>>> 3) Does it impact on-disk layout at all for tables to be in a
>>> different keyspace from others? Is any sort of file fragmentation
>>> potentially introduced just by doing this in a new keyspace as opposed
>>> to an exiting one?
>>>
>>> 4) Does it add any metadata overhead to the system keyspace?
>>>
>>> 5) Why might we *not* want to make a separate keyspace for this?
>>>
>>> 6) Does anyone have experience with creating additional keyspaces to
>>> the point that Cassandra can no longer handle it? Note that we're
>>> *not* planning to do this, I'm just curious.
>>>
>>> Cheers,
>>> Martin
>>>
>>
>>
>
>
>

Re: How expensive are additional keyspaces?

Posted by Jeremiah D Jordan <je...@gmail.com>.
Also, in terms of overhead, server side the overhead is pretty much all at the Column Family (CF)/Table level, so 100 keyspaces, 1 CF each, is the same as 1 keyspace, 100 CF's.

-Jeremiah

On Mar 11, 2014, at 10:36 AM, Jeremiah D Jordan <je...@gmail.com> wrote:

> The use of more than one keyspace is not uncommon.  Using 100's of them is.  That being said, different keyspaces let you specify different replication and different authentication.  If you are not going to be doing one of those things, then there really is no point to multiple keyspaces.  If you do want to do one of those things, then go for it, make multiple keyspaces.
> 
> 
> -Jeremiah
> 
> On Mar 11, 2014, at 10:17 AM, Edward Capriolo <ed...@gmail.com> wrote:
> 
>> I am not sure. As stated the only benefit of multiple keyspaces is if you need:
>>  
>> 1) different replication per keyspace
>> 2) different multiple data center configurations per keyspace
>> 
>> Unless you have one of these cases you do not need to do this. I would always tackle this problem at the application level using something like:
>> 
>> http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html
>> 
>> Client issues aside, it is not a very common case and I would advice against uncommon set ups.
>> 
>> 
>> 
>> On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright <kw...@nanigans.com> wrote:
>> Does this whole true for the native protocol?  I’ve noticed that you can create a session object in the datastax driver without specifying a keyspace and so long as you include the keyspace in all queries instead of just table name, it works fine.  In that case, I assume there’s only one connection pool for all keyspaces.
>> 
>> From: Edward Capriolo <ed...@gmail.com>
>> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Date: Tuesday, March 11, 2014 at 11:05 AM
>> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
>> Subject: Re: How expensive are additional keyspaces?
>> 
>> The biggest expense of them is that you need to be authenticated to a keyspace to perform and operation. Thus connection pools are bound to keyspaces. Switching a keyspace is an RPC operation. In the thrift client, If you have 100 keyspaces you need 100 connection pools that starts to be a pain very quickly. 
>> 
>> I suggest keeping everything in one keyspace unless you really need different replication factors and or network replication settings per keyspace.
>> 
>> 
>> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <el...@gmail.com> wrote:
>> Hey all -
>> 
>> My company is working on introducing a configuration service system to
>> provide cofig data to several of our applications, to be backed by
>> Cassandra. We're already using Cassandra for other services, and at
>> the moment our pending design just puts all the new tables (9 of them,
>> I believe) in one of our pre-existing keyspaces.
>> 
>> I've got a few questions about keyspaces that I'm hoping for input on.
>> Some Google hunting didn't turn up obvious answers, at least not for
>> recent versions of Cassandra.
>> 
>> 1) What trade offs are being made by using a new keyspace versus
>> re-purposing an existing one (that is in active use by another
>> application)? Organization is the obvious answer, I'm looking for any
>> technical reasons.
>> 
>> 2) Is there any per-keyspace overhead incurred by the cluster?
>> 
>> 3) Does it impact on-disk layout at all for tables to be in a
>> different keyspace from others? Is any sort of file fragmentation
>> potentially introduced just by doing this in a new keyspace as opposed
>> to an exiting one?
>> 
>> 4) Does it add any metadata overhead to the system keyspace?
>> 
>> 5) Why might we *not* want to make a separate keyspace for this?
>> 
>> 6) Does anyone have experience with creating additional keyspaces to
>> the point that Cassandra can no longer handle it? Note that we're
>> *not* planning to do this, I'm just curious.
>> 
>> Cheers,
>> Martin
>> 
>> 
> 


Re: How expensive are additional keyspaces?

Posted by Jeremiah D Jordan <je...@gmail.com>.
The use of more than one keyspace is not uncommon.  Using 100's of them is.  That being said, different keyspaces let you specify different replication and different authentication.  If you are not going to be doing one of those things, then there really is no point to multiple keyspaces.  If you do want to do one of those things, then go for it, make multiple keyspaces.


-Jeremiah

On Mar 11, 2014, at 10:17 AM, Edward Capriolo <ed...@gmail.com> wrote:

> I am not sure. As stated the only benefit of multiple keyspaces is if you need:
>  
> 1) different replication per keyspace
> 2) different multiple data center configurations per keyspace
> 
> Unless you have one of these cases you do not need to do this. I would always tackle this problem at the application level using something like:
> 
> http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html
> 
> Client issues aside, it is not a very common case and I would advice against uncommon set ups.
> 
> 
> 
> On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright <kw...@nanigans.com> wrote:
> Does this whole true for the native protocol?  I’ve noticed that you can create a session object in the datastax driver without specifying a keyspace and so long as you include the keyspace in all queries instead of just table name, it works fine.  In that case, I assume there’s only one connection pool for all keyspaces.
> 
> From: Edward Capriolo <ed...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Tuesday, March 11, 2014 at 11:05 AM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: How expensive are additional keyspaces?
> 
> The biggest expense of them is that you need to be authenticated to a keyspace to perform and operation. Thus connection pools are bound to keyspaces. Switching a keyspace is an RPC operation. In the thrift client, If you have 100 keyspaces you need 100 connection pools that starts to be a pain very quickly. 
> 
> I suggest keeping everything in one keyspace unless you really need different replication factors and or network replication settings per keyspace.
> 
> 
> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <el...@gmail.com> wrote:
> Hey all -
> 
> My company is working on introducing a configuration service system to
> provide cofig data to several of our applications, to be backed by
> Cassandra. We're already using Cassandra for other services, and at
> the moment our pending design just puts all the new tables (9 of them,
> I believe) in one of our pre-existing keyspaces.
> 
> I've got a few questions about keyspaces that I'm hoping for input on.
> Some Google hunting didn't turn up obvious answers, at least not for
> recent versions of Cassandra.
> 
> 1) What trade offs are being made by using a new keyspace versus
> re-purposing an existing one (that is in active use by another
> application)? Organization is the obvious answer, I'm looking for any
> technical reasons.
> 
> 2) Is there any per-keyspace overhead incurred by the cluster?
> 
> 3) Does it impact on-disk layout at all for tables to be in a
> different keyspace from others? Is any sort of file fragmentation
> potentially introduced just by doing this in a new keyspace as opposed
> to an exiting one?
> 
> 4) Does it add any metadata overhead to the system keyspace?
> 
> 5) Why might we *not* want to make a separate keyspace for this?
> 
> 6) Does anyone have experience with creating additional keyspaces to
> the point that Cassandra can no longer handle it? Note that we're
> *not* planning to do this, I'm just curious.
> 
> Cheers,
> Martin
> 
> 


Re: How expensive are additional keyspaces?

Posted by Edward Capriolo <ed...@gmail.com>.
I am not sure. As stated the only benefit of multiple keyspaces is if you
need:

1) different replication per keyspace
2) different multiple data center configurations per keyspace

Unless you have one of these cases you do not need to do this. I would
always tackle this problem at the application level using something like:

http://hector-client.github.io/hector/build/html/content/virtual_keyspaces.html

Client issues aside, it is not a very common case and I would advice
against uncommon set ups.



On Tue, Mar 11, 2014 at 11:08 AM, Keith Wright <kw...@nanigans.com> wrote:

> Does this whole true for the native protocol?  I've noticed that you can
> create a session object in the datastax driver without specifying a
> keyspace and so long as you include the keyspace in all queries instead of
> just table name, it works fine.  In that case, I assume there's only one
> connection pool for all keyspaces.
>
> From: Edward Capriolo <ed...@gmail.com>
> Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Date: Tuesday, March 11, 2014 at 11:05 AM
> To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
> Subject: Re: How expensive are additional keyspaces?
>
> The biggest expense of them is that you need to be authenticated to a
> keyspace to perform and operation. Thus connection pools are bound to
> keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
> If you have 100 keyspaces you need 100 connection pools that starts to be a
> pain very quickly.
>
> I suggest keeping everything in one keyspace unless you really need
> different replication factors and or network replication settings per
> keyspace.
>
>
> On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <el...@gmail.com>wrote:
>
>> Hey all -
>>
>> My company is working on introducing a configuration service system to
>> provide cofig data to several of our applications, to be backed by
>> Cassandra. We're already using Cassandra for other services, and at
>> the moment our pending design just puts all the new tables (9 of them,
>> I believe) in one of our pre-existing keyspaces.
>>
>> I've got a few questions about keyspaces that I'm hoping for input on.
>> Some Google hunting didn't turn up obvious answers, at least not for
>> recent versions of Cassandra.
>>
>> 1) What trade offs are being made by using a new keyspace versus
>> re-purposing an existing one (that is in active use by another
>> application)? Organization is the obvious answer, I'm looking for any
>> technical reasons.
>>
>> 2) Is there any per-keyspace overhead incurred by the cluster?
>>
>> 3) Does it impact on-disk layout at all for tables to be in a
>> different keyspace from others? Is any sort of file fragmentation
>> potentially introduced just by doing this in a new keyspace as opposed
>> to an exiting one?
>>
>> 4) Does it add any metadata overhead to the system keyspace?
>>
>> 5) Why might we *not* want to make a separate keyspace for this?
>>
>> 6) Does anyone have experience with creating additional keyspaces to
>> the point that Cassandra can no longer handle it? Note that we're
>> *not* planning to do this, I'm just curious.
>>
>> Cheers,
>> Martin
>>
>
>

Re: How expensive are additional keyspaces?

Posted by Keith Wright <kw...@nanigans.com>.
Does this whole true for the native protocol?  I’ve noticed that you can create a session object in the datastax driver without specifying a keyspace and so long as you include the keyspace in all queries instead of just table name, it works fine.  In that case, I assume there’s only one connection pool for all keyspaces.

From: Edward Capriolo <ed...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Tuesday, March 11, 2014 at 11:05 AM
To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Subject: Re: How expensive are additional keyspaces?

The biggest expense of them is that you need to be authenticated to a keyspace to perform and operation. Thus connection pools are bound to keyspaces. Switching a keyspace is an RPC operation. In the thrift client, If you have 100 keyspaces you need 100 connection pools that starts to be a pain very quickly.

I suggest keeping everything in one keyspace unless you really need different replication factors and or network replication settings per keyspace.


On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <el...@gmail.com>> wrote:
Hey all -

My company is working on introducing a configuration service system to
provide cofig data to several of our applications, to be backed by
Cassandra. We're already using Cassandra for other services, and at
the moment our pending design just puts all the new tables (9 of them,
I believe) in one of our pre-existing keyspaces.

I've got a few questions about keyspaces that I'm hoping for input on.
Some Google hunting didn't turn up obvious answers, at least not for
recent versions of Cassandra.

1) What trade offs are being made by using a new keyspace versus
re-purposing an existing one (that is in active use by another
application)? Organization is the obvious answer, I'm looking for any
technical reasons.

2) Is there any per-keyspace overhead incurred by the cluster?

3) Does it impact on-disk layout at all for tables to be in a
different keyspace from others? Is any sort of file fragmentation
potentially introduced just by doing this in a new keyspace as opposed
to an exiting one?

4) Does it add any metadata overhead to the system keyspace?

5) Why might we *not* want to make a separate keyspace for this?

6) Does anyone have experience with creating additional keyspaces to
the point that Cassandra can no longer handle it? Note that we're
*not* planning to do this, I'm just curious.

Cheers,
Martin


Re: How expensive are additional keyspaces?

Posted by Edward Capriolo <ed...@gmail.com>.
The biggest expense of them is that you need to be authenticated to a
keyspace to perform and operation. Thus connection pools are bound to
keyspaces. Switching a keyspace is an RPC operation. In the thrift client,
If you have 100 keyspaces you need 100 connection pools that starts to be a
pain very quickly.

I suggest keeping everything in one keyspace unless you really need
different replication factors and or network replication settings per
keyspace.


On Tue, Mar 11, 2014 at 10:17 AM, Martin Meyer <el...@gmail.com>wrote:

> Hey all -
>
> My company is working on introducing a configuration service system to
> provide cofig data to several of our applications, to be backed by
> Cassandra. We're already using Cassandra for other services, and at
> the moment our pending design just puts all the new tables (9 of them,
> I believe) in one of our pre-existing keyspaces.
>
> I've got a few questions about keyspaces that I'm hoping for input on.
> Some Google hunting didn't turn up obvious answers, at least not for
> recent versions of Cassandra.
>
> 1) What trade offs are being made by using a new keyspace versus
> re-purposing an existing one (that is in active use by another
> application)? Organization is the obvious answer, I'm looking for any
> technical reasons.
>
> 2) Is there any per-keyspace overhead incurred by the cluster?
>
> 3) Does it impact on-disk layout at all for tables to be in a
> different keyspace from others? Is any sort of file fragmentation
> potentially introduced just by doing this in a new keyspace as opposed
> to an exiting one?
>
> 4) Does it add any metadata overhead to the system keyspace?
>
> 5) Why might we *not* want to make a separate keyspace for this?
>
> 6) Does anyone have experience with creating additional keyspaces to
> the point that Cassandra can no longer handle it? Note that we're
> *not* planning to do this, I'm just curious.
>
> Cheers,
> Martin
>