You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alexandru Sicoe <ad...@gmail.com> on 2012/03/14 18:05:48 UTC

Datastax Enterprise mixed workload cluster configuration

Hi everyone,
 I want to test out the Datastax Enterprise software to have a mixed
workload setup with an analytics and a real time part.

 However I am not sure how to configure it to achieve what I want: I will
have 3 real machines on one side of a gateway (1,2,3) and 6 VMs on
another(4,5,6).
 1,2,3 will each have a normal Cassandra node that just takes data directly
from my data sources. I want them to replicate the data to the other 6 VMs.
Now, out of those 6 VMs 4,5,6 will run normal Cassandra nodes and 7,8,9
will run Analytics nodes. So I only want to write to the 1,2,3 and I only
want to serve user reads from 4,5,6 and do analytics on 7,8,9.  Can I
achieve this by configuring 1,2,3,4,5,6 as normal nodes and the rest as
analytics nodes? If I alternate the tokens as it's explained in
http://www.datastax.com/docs/1.0/datastax_enterprise/init_dse_cluster#init-dseis
it analoguous to achieving something like 3 DCs each getting their own
replica?

Thanks,
Alex

Re: Datastax Enterprise mixed workload cluster configuration

Posted by Alexandru Sicoe <ad...@gmail.com>.
Hi,

Since this thread already contains the system setup, I just want to ask
another question:

If you have 3 data centers (DC1,DC2 and DC3) and you have a keyspace where
the strategy options are such that each DC gets one replica. If you only
write to the nodes in one DC1 what is the path the replicas take assuming
you're correctly interleaved and evenly spaced the tokens of all the nodes?
If you write a record in a node in DC1 will it replicate it to the node in
DC2 and the node in DC2 will replicate it to the node in DC3? Or will the
node in DC1 replicate the record both to DC2 and DC3?

Cheers,
Alex

On Thu, Mar 15, 2012 at 11:26 PM, Alexandru Sicoe <ad...@gmail.com> wrote:

> Sorry for that last message, I was confused because I thought I needed to
> use the DseSimpleSnitch but of course I can use the PropertyFileSnitch and
> that allows me to get the configuration with 3 data centers explained.
>
> Cheers,
> Alex
>
>
> On Thu, Mar 15, 2012 at 10:56 AM, Alexandru Sicoe <ad...@gmail.com>wrote:
>
>> Thanks Tyler,
>>  I see that cassandra.yaml has "endpoint_snitch:
>> com.datastax.bdp.snitch.DseSimpleSnitch". Will this pick up the
>> configuration from the cassandra-topology.properties file as does the
>> PropertyFileSnitch ? Or is there some other way of telling it which nodes
>> are in withc DC?
>>
>> Cheers,
>> Alex
>>
>>
>> On Wed, Mar 14, 2012 at 9:09 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>>
>>> Yes, you can do this.
>>>
>>> You will want to have three DCs: DC1 with [1, 2, 3], DC2 with [4, 5, 6],
>>> and DC3 with [7, 8, 9].  For your normal data keyspace, the replication
>>> strategy should be NTS, and the strategy_options should have some replicas
>>> in each of the three DCs.  For example: {DC1: 3, DC2: 3, DC3: 3} if you
>>> need that level of replication in each one (although you probably only want
>>> an RF of 1 for DC3).
>>>
>>> Your clients that are performing writes should only open connections
>>> against the nodes in DC1, and you should write at CL.ONE or
>>> CL.LOCAL_QUORUM.  Likewise for reads, your clients should only connect to
>>> nodes in DC2, and you should read at CL.ONE or CL.LOCAL_QUORUM.
>>>
>>> The nodes in DC3 should run as analytics nodes.  I believe the default
>>> CL for m/r jobs is ONE, which would work.
>>>
>>> As far as tokens go, interleaving all three DCs and evenly spacing the
>>> tokens will work.  For example, the ordering of your nodes might be [1, 4,
>>> 7, 2, 5, 8, 3, 6, 9].
>>>
>>>
>>> On Wed, Mar 14, 2012 at 12:05 PM, Alexandru Sicoe <ad...@gmail.com>wrote:
>>>
>>>> Hi everyone,
>>>>  I want to test out the Datastax Enterprise software to have a mixed
>>>> workload setup with an analytics and a real time part.
>>>>
>>>>  However I am not sure how to configure it to achieve what I want: I
>>>> will have 3 real machines on one side of a gateway (1,2,3) and 6 VMs on
>>>> another(4,5,6).
>>>>  1,2,3 will each have a normal Cassandra node that just takes data
>>>> directly from my data sources. I want them to replicate the data to the
>>>> other 6 VMs. Now, out of those 6 VMs 4,5,6 will run normal Cassandra nodes
>>>> and 7,8,9 will run Analytics nodes. So I only want to write to the 1,2,3
>>>> and I only want to serve user reads from 4,5,6 and do analytics on 7,8,9.
>>>> Can I achieve this by configuring 1,2,3,4,5,6 as normal nodes and the rest
>>>> as analytics nodes? If I alternate the tokens as it's explained in
>>>> http://www.datastax.com/docs/1.0/datastax_enterprise/init_dse_cluster#init-dseis it analoguous to achieving something like 3 DCs each getting their own
>>>> replica?
>>>>
>>>> Thanks,
>>>> Alex
>>>>
>>>>
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax <http://datastax.com/>
>>>
>>>
>>
>

Re: Datastax Enterprise mixed workload cluster configuration

Posted by Alexandru Sicoe <ad...@gmail.com>.
Sorry for that last message, I was confused because I thought I needed to
use the DseSimpleSnitch but of course I can use the PropertyFileSnitch and
that allows me to get the configuration with 3 data centers explained.

Cheers,
Alex

On Thu, Mar 15, 2012 at 10:56 AM, Alexandru Sicoe <ad...@gmail.com> wrote:

> Thanks Tyler,
>  I see that cassandra.yaml has "endpoint_snitch:
> com.datastax.bdp.snitch.DseSimpleSnitch". Will this pick up the
> configuration from the cassandra-topology.properties file as does the
> PropertyFileSnitch ? Or is there some other way of telling it which nodes
> are in withc DC?
>
> Cheers,
> Alex
>
>
> On Wed, Mar 14, 2012 at 9:09 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>
>> Yes, you can do this.
>>
>> You will want to have three DCs: DC1 with [1, 2, 3], DC2 with [4, 5, 6],
>> and DC3 with [7, 8, 9].  For your normal data keyspace, the replication
>> strategy should be NTS, and the strategy_options should have some replicas
>> in each of the three DCs.  For example: {DC1: 3, DC2: 3, DC3: 3} if you
>> need that level of replication in each one (although you probably only want
>> an RF of 1 for DC3).
>>
>> Your clients that are performing writes should only open connections
>> against the nodes in DC1, and you should write at CL.ONE or
>> CL.LOCAL_QUORUM.  Likewise for reads, your clients should only connect to
>> nodes in DC2, and you should read at CL.ONE or CL.LOCAL_QUORUM.
>>
>> The nodes in DC3 should run as analytics nodes.  I believe the default CL
>> for m/r jobs is ONE, which would work.
>>
>> As far as tokens go, interleaving all three DCs and evenly spacing the
>> tokens will work.  For example, the ordering of your nodes might be [1, 4,
>> 7, 2, 5, 8, 3, 6, 9].
>>
>>
>> On Wed, Mar 14, 2012 at 12:05 PM, Alexandru Sicoe <ad...@gmail.com>wrote:
>>
>>> Hi everyone,
>>>  I want to test out the Datastax Enterprise software to have a mixed
>>> workload setup with an analytics and a real time part.
>>>
>>>  However I am not sure how to configure it to achieve what I want: I
>>> will have 3 real machines on one side of a gateway (1,2,3) and 6 VMs on
>>> another(4,5,6).
>>>  1,2,3 will each have a normal Cassandra node that just takes data
>>> directly from my data sources. I want them to replicate the data to the
>>> other 6 VMs. Now, out of those 6 VMs 4,5,6 will run normal Cassandra nodes
>>> and 7,8,9 will run Analytics nodes. So I only want to write to the 1,2,3
>>> and I only want to serve user reads from 4,5,6 and do analytics on 7,8,9.
>>> Can I achieve this by configuring 1,2,3,4,5,6 as normal nodes and the rest
>>> as analytics nodes? If I alternate the tokens as it's explained in
>>> http://www.datastax.com/docs/1.0/datastax_enterprise/init_dse_cluster#init-dseis it analoguous to achieving something like 3 DCs each getting their own
>>> replica?
>>>
>>> Thanks,
>>> Alex
>>>
>>>
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>>
>

Re: Datastax Enterprise mixed workload cluster configuration

Posted by Alexandru Sicoe <ad...@gmail.com>.
Thanks Tyler,
 I see that cassandra.yaml has "endpoint_snitch:
com.datastax.bdp.snitch.DseSimpleSnitch". Will this pick up the
configuration from the cassandra-topology.properties file as does the
PropertyFileSnitch ? Or is there some other way of telling it which nodes
are in withc DC?

Cheers,
Alex

On Wed, Mar 14, 2012 at 9:09 PM, Tyler Hobbs <ty...@datastax.com> wrote:

> Yes, you can do this.
>
> You will want to have three DCs: DC1 with [1, 2, 3], DC2 with [4, 5, 6],
> and DC3 with [7, 8, 9].  For your normal data keyspace, the replication
> strategy should be NTS, and the strategy_options should have some replicas
> in each of the three DCs.  For example: {DC1: 3, DC2: 3, DC3: 3} if you
> need that level of replication in each one (although you probably only want
> an RF of 1 for DC3).
>
> Your clients that are performing writes should only open connections
> against the nodes in DC1, and you should write at CL.ONE or
> CL.LOCAL_QUORUM.  Likewise for reads, your clients should only connect to
> nodes in DC2, and you should read at CL.ONE or CL.LOCAL_QUORUM.
>
> The nodes in DC3 should run as analytics nodes.  I believe the default CL
> for m/r jobs is ONE, which would work.
>
> As far as tokens go, interleaving all three DCs and evenly spacing the
> tokens will work.  For example, the ordering of your nodes might be [1, 4,
> 7, 2, 5, 8, 3, 6, 9].
>
>
> On Wed, Mar 14, 2012 at 12:05 PM, Alexandru Sicoe <ad...@gmail.com>wrote:
>
>> Hi everyone,
>>  I want to test out the Datastax Enterprise software to have a mixed
>> workload setup with an analytics and a real time part.
>>
>>  However I am not sure how to configure it to achieve what I want: I will
>> have 3 real machines on one side of a gateway (1,2,3) and 6 VMs on
>> another(4,5,6).
>>  1,2,3 will each have a normal Cassandra node that just takes data
>> directly from my data sources. I want them to replicate the data to the
>> other 6 VMs. Now, out of those 6 VMs 4,5,6 will run normal Cassandra nodes
>> and 7,8,9 will run Analytics nodes. So I only want to write to the 1,2,3
>> and I only want to serve user reads from 4,5,6 and do analytics on 7,8,9.
>> Can I achieve this by configuring 1,2,3,4,5,6 as normal nodes and the rest
>> as analytics nodes? If I alternate the tokens as it's explained in
>> http://www.datastax.com/docs/1.0/datastax_enterprise/init_dse_cluster#init-dseis it analoguous to achieving something like 3 DCs each getting their own
>> replica?
>>
>> Thanks,
>> Alex
>>
>>
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
>

Re: Datastax Enterprise mixed workload cluster configuration

Posted by Tyler Hobbs <ty...@datastax.com>.
Yes, you can do this.

You will want to have three DCs: DC1 with [1, 2, 3], DC2 with [4, 5, 6],
and DC3 with [7, 8, 9].  For your normal data keyspace, the replication
strategy should be NTS, and the strategy_options should have some replicas
in each of the three DCs.  For example: {DC1: 3, DC2: 3, DC3: 3} if you
need that level of replication in each one (although you probably only want
an RF of 1 for DC3).

Your clients that are performing writes should only open connections
against the nodes in DC1, and you should write at CL.ONE or
CL.LOCAL_QUORUM.  Likewise for reads, your clients should only connect to
nodes in DC2, and you should read at CL.ONE or CL.LOCAL_QUORUM.

The nodes in DC3 should run as analytics nodes.  I believe the default CL
for m/r jobs is ONE, which would work.

As far as tokens go, interleaving all three DCs and evenly spacing the
tokens will work.  For example, the ordering of your nodes might be [1, 4,
7, 2, 5, 8, 3, 6, 9].

On Wed, Mar 14, 2012 at 12:05 PM, Alexandru Sicoe <ad...@gmail.com> wrote:

> Hi everyone,
>  I want to test out the Datastax Enterprise software to have a mixed
> workload setup with an analytics and a real time part.
>
>  However I am not sure how to configure it to achieve what I want: I will
> have 3 real machines on one side of a gateway (1,2,3) and 6 VMs on
> another(4,5,6).
>  1,2,3 will each have a normal Cassandra node that just takes data
> directly from my data sources. I want them to replicate the data to the
> other 6 VMs. Now, out of those 6 VMs 4,5,6 will run normal Cassandra nodes
> and 7,8,9 will run Analytics nodes. So I only want to write to the 1,2,3
> and I only want to serve user reads from 4,5,6 and do analytics on 7,8,9.
> Can I achieve this by configuring 1,2,3,4,5,6 as normal nodes and the rest
> as analytics nodes? If I alternate the tokens as it's explained in
> http://www.datastax.com/docs/1.0/datastax_enterprise/init_dse_cluster#init-dseis it analoguous to achieving something like 3 DCs each getting their own
> replica?
>
> Thanks,
> Alex
>
>


-- 
Tyler Hobbs
DataStax <http://datastax.com/>