You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Chuck Reynolds <cr...@ancestry.com> on 2017/08/07 18:50:37 UTC

Different data size between datacenters

I have a cluster that spans two datacenters running Cassandra 2.1.12.  135 nodes in my data center and about 185 in AWS.

The size of the second data center (AWS) is quite a bit smaller.  Replication is the same in both datacenters.  Is there a logical explanation for this?

thanks

Re: Different data size between datacenters

Posted by Chuck Reynolds <cr...@ancestry.com>.
So we have the default 256 in our datacenter and 128 in AWS.

From: "ZAIDI, ASAD A" <az...@att.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Monday, August 7, 2017 at 1:36 PM
To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Subject: RE: Different data size between datacenters

Are you using same number of token/vnodes in both data centers?

From: Chuck Reynolds [mailto:creynolds@ancestry.com]
Sent: Monday, August 07, 2017 1:51 PM
To: user@cassandra.apache.org
Subject: Different data size between datacenters

I have a cluster that spans two datacenters running Cassandra 2.1.12.  135 nodes in my data center and about 185 in AWS.

The size of the second data center (AWS) is quite a bit smaller.  Replication is the same in both datacenters.  Is there a logical explanation for this?

thanks

RE: Different data size between datacenters

Posted by "ZAIDI, ASAD A" <az...@att.com>.
Are you using same number of token/vnodes in both data centers?

From: Chuck Reynolds [mailto:creynolds@ancestry.com]
Sent: Monday, August 07, 2017 1:51 PM
To: user@cassandra.apache.org
Subject: Different data size between datacenters

I have a cluster that spans two datacenters running Cassandra 2.1.12.  135 nodes in my data center and about 185 in AWS.

The size of the second data center (AWS) is quite a bit smaller.  Replication is the same in both datacenters.  Is there a logical explanation for this?

thanks

Re: Different data size between datacenters

Posted by Jeff Jirsa <jj...@gmail.com>.
Tombstones should eventually compact away in most cases, but if you've
recently changed topology (added nodes, removed nodes, etc), you should run
"nodetool cleanup" to remove no-longer-owned data (start by running it on
one instance at a time, it's a form of compaction and can impact disk space
and latencies).


On Mon, Aug 7, 2017 at 2:04 PM, Chuck Reynolds <cr...@ancestry.com>
wrote:

> Yes it’s the total size.
>
>
>
> Could it be that tombstones or data that nodes no longer own is not being
> copied/streamed to the data center in AWS?
>
>
>
> *From: *Jeff Jirsa <jj...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Monday, August 7, 2017 at 2:51 PM
> *To: *cassandra <us...@cassandra.apache.org>
> *Subject: *Re: Different data size between datacenters
>
>
>
> And when you say the data size is smaller, you mean per node? Or sum of
> all nodes in the datacenter?
>
>
>
> With 185 hosts in AWS vs 135 in your DC, I would expect your DC hosts to
> have  30% less data per host than AWS.
>
>
>
> If instead they have twice as much, it sounds like it's balancing by # of
> tokens instead, which may be an indication that you're somehow using
> SimpleStrategy, or your NetworkTopologyStrategy is somehow misconfigured
> for one or more keyspaces.
>
>
>
> Can you paste your keyspace replication strategy lines, anonymized as
> needed?
>
>
>
>
>
> On Mon, Aug 7, 2017 at 1:46 PM, Chuck Reynolds <cr...@ancestry.com>
> wrote:
>
> Yes to the NetworkTopologyStrategy.
>
>
>
> *From: *Jeff Jirsa <jj...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Monday, August 7, 2017 at 2:39 PM
> *To: *cassandra <us...@cassandra.apache.org>
> *Subject: *Re: Different data size between datacenters
>
>
>
> You're using NetworkTopologyStrategy and not SimpleStrategy, correct?
>
>
>
>
>
> On Mon, Aug 7, 2017 at 11:50 AM, Chuck Reynolds <cr...@ancestry.com>
> wrote:
>
> I have a cluster that spans two datacenters running Cassandra 2.1.12.  135
> nodes in my data center and about 185 in AWS.
>
>
>
> The size of the second data center (AWS) is quite a bit smaller.
> Replication is the same in both datacenters.  Is there a logical
> explanation for this?
>
>
>
> thanks
>
>
>
>
>

Re: Different data size between datacenters

Posted by Chuck Reynolds <cr...@ancestry.com>.
Yes it’s the total size.

Could it be that tombstones or data that nodes no longer own is not being copied/streamed to the data center in AWS?

From: Jeff Jirsa <jj...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Monday, August 7, 2017 at 2:51 PM
To: cassandra <us...@cassandra.apache.org>
Subject: Re: Different data size between datacenters

And when you say the data size is smaller, you mean per node? Or sum of all nodes in the datacenter?

With 185 hosts in AWS vs 135 in your DC, I would expect your DC hosts to have  30% less data per host than AWS.

If instead they have twice as much, it sounds like it's balancing by # of tokens instead, which may be an indication that you're somehow using SimpleStrategy, or your NetworkTopologyStrategy is somehow misconfigured for one or more keyspaces.

Can you paste your keyspace replication strategy lines, anonymized as needed?


On Mon, Aug 7, 2017 at 1:46 PM, Chuck Reynolds <cr...@ancestry.com>> wrote:
Yes to the NetworkTopologyStrategy.

From: Jeff Jirsa <jj...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Monday, August 7, 2017 at 2:39 PM
To: cassandra <us...@cassandra.apache.org>>
Subject: Re: Different data size between datacenters

You're using NetworkTopologyStrategy and not SimpleStrategy, correct?


On Mon, Aug 7, 2017 at 11:50 AM, Chuck Reynolds <cr...@ancestry.com>> wrote:
I have a cluster that spans two datacenters running Cassandra 2.1.12.  135 nodes in my data center and about 185 in AWS.

The size of the second data center (AWS) is quite a bit smaller.  Replication is the same in both datacenters.  Is there a logical explanation for this?

thanks



Re: Different data size between datacenters

Posted by Chuck Reynolds <cr...@ancestry.com>.
Keyspace has WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'us-east-productiondata': '3'}  AND durable_writes = true;

From: Jeff Jirsa <jj...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Monday, August 7, 2017 at 2:51 PM
To: cassandra <us...@cassandra.apache.org>
Subject: Re: Different data size between datacenters

And when you say the data size is smaller, you mean per node? Or sum of all nodes in the datacenter?

With 185 hosts in AWS vs 135 in your DC, I would expect your DC hosts to have  30% less data per host than AWS.

If instead they have twice as much, it sounds like it's balancing by # of tokens instead, which may be an indication that you're somehow using SimpleStrategy, or your NetworkTopologyStrategy is somehow misconfigured for one or more keyspaces.

Can you paste your keyspace replication strategy lines, anonymized as needed?


On Mon, Aug 7, 2017 at 1:46 PM, Chuck Reynolds <cr...@ancestry.com>> wrote:
Yes to the NetworkTopologyStrategy.

From: Jeff Jirsa <jj...@gmail.com>>
Reply-To: "user@cassandra.apache.org<ma...@cassandra.apache.org>" <us...@cassandra.apache.org>>
Date: Monday, August 7, 2017 at 2:39 PM
To: cassandra <us...@cassandra.apache.org>>
Subject: Re: Different data size between datacenters

You're using NetworkTopologyStrategy and not SimpleStrategy, correct?


On Mon, Aug 7, 2017 at 11:50 AM, Chuck Reynolds <cr...@ancestry.com>> wrote:
I have a cluster that spans two datacenters running Cassandra 2.1.12.  135 nodes in my data center and about 185 in AWS.

The size of the second data center (AWS) is quite a bit smaller.  Replication is the same in both datacenters.  Is there a logical explanation for this?

thanks



Re: Different data size between datacenters

Posted by Jeff Jirsa <jj...@gmail.com>.
And when you say the data size is smaller, you mean per node? Or sum of all
nodes in the datacenter?

With 185 hosts in AWS vs 135 in your DC, I would expect your DC hosts to
have  30% less data per host than AWS.

If instead they have twice as much, it sounds like it's balancing by # of
tokens instead, which may be an indication that you're somehow using
SimpleStrategy, or your NetworkTopologyStrategy is somehow misconfigured
for one or more keyspaces.

Can you paste your keyspace replication strategy lines, anonymized as
needed?


On Mon, Aug 7, 2017 at 1:46 PM, Chuck Reynolds <cr...@ancestry.com>
wrote:

> Yes to the NetworkTopologyStrategy.
>
>
>
> *From: *Jeff Jirsa <jj...@gmail.com>
> *Reply-To: *"user@cassandra.apache.org" <us...@cassandra.apache.org>
> *Date: *Monday, August 7, 2017 at 2:39 PM
> *To: *cassandra <us...@cassandra.apache.org>
> *Subject: *Re: Different data size between datacenters
>
>
>
> You're using NetworkTopologyStrategy and not SimpleStrategy, correct?
>
>
>
>
>
> On Mon, Aug 7, 2017 at 11:50 AM, Chuck Reynolds <cr...@ancestry.com>
> wrote:
>
> I have a cluster that spans two datacenters running Cassandra 2.1.12.  135
> nodes in my data center and about 185 in AWS.
>
>
>
> The size of the second data center (AWS) is quite a bit smaller.
> Replication is the same in both datacenters.  Is there a logical
> explanation for this?
>
>
>
> thanks
>
>
>

Re: Different data size between datacenters

Posted by Chuck Reynolds <cr...@ancestry.com>.
Yes to the NetworkTopologyStrategy.

From: Jeff Jirsa <jj...@gmail.com>
Reply-To: "user@cassandra.apache.org" <us...@cassandra.apache.org>
Date: Monday, August 7, 2017 at 2:39 PM
To: cassandra <us...@cassandra.apache.org>
Subject: Re: Different data size between datacenters

You're using NetworkTopologyStrategy and not SimpleStrategy, correct?


On Mon, Aug 7, 2017 at 11:50 AM, Chuck Reynolds <cr...@ancestry.com>> wrote:
I have a cluster that spans two datacenters running Cassandra 2.1.12.  135 nodes in my data center and about 185 in AWS.

The size of the second data center (AWS) is quite a bit smaller.  Replication is the same in both datacenters.  Is there a logical explanation for this?

thanks


Re: Different data size between datacenters

Posted by Jeff Jirsa <jj...@gmail.com>.
You're using NetworkTopologyStrategy and not SimpleStrategy, correct?


On Mon, Aug 7, 2017 at 11:50 AM, Chuck Reynolds <cr...@ancestry.com>
wrote:

> I have a cluster that spans two datacenters running Cassandra 2.1.12.  135
> nodes in my data center and about 185 in AWS.
>
>
>
> The size of the second data center (AWS) is quite a bit smaller.
> Replication is the same in both datacenters.  Is there a logical
> explanation for this?
>
>
>
> thanks
>