You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Marc Hoppins <ma...@eset.com> on 2022/06/02 12:13:17 UTC

Topology vs RackDC

Hi all,

Why is RACKDC preferred for production than TOPOLOGY?

Surely one common file is far simpler to distribute than deal with the mucky-muck of various configs for each host if they are in one rack or another and/or one datacentre or another?  It is also fairly self-documenting of the setup with the entire cluster there in one file.

From what I read in the documentation, regardless of which snitch one implements, cassandra-topology.properties will get read, either as a primary or as a backup...so why not just use topology for ALL cases?

Thanks

Marc

Re: Topology vs RackDC

Posted by Elliott Sims <el...@backblaze.com>.
In terms of turning it into Ansible, it's going to depend a lot on how you
manage the physical layer as well as replication/consistency.  Currently, I
just use groups per "rack".  If you have an API-accessible CMDB you could
probably pull the physical location from there and translate that to
rack/DC info.  In our case the "rack" is used to control replica locations,
and that actually drives where the hosts will be physically located (trying
to avoid multiple racks/replicas on one switch or power rail)

On Fri, Jun 3, 2022 at 12:20 AM Marc Hoppins <ma...@eset.com> wrote:

> There are cases supporting both sides. I can see the benefits of the more
> dynamic setup.
>
>
>
> However, how do you ansible/automate when you have multiple switches in 2
> or more datacentres and all your nodes are in the same VLAN or VLANs? This
> is the sticking point which I am trying to get to the bottom of.  I am not
> fully au fait with Ansible and we are also using Ans. Tower which allows
> for more flexibility so here should be some practical options.
>
>
>
> *From:* Durity, Sean R <SE...@homedepot.com>
> *Sent:* Thursday, June 2, 2022 7:04 PM
> *To:* user@cassandra.apache.org
> *Subject:* RE: Topology vs RackDC
>
>
>
> EXTERNAL
>
> I agree; it does depend. Our ansible could not infer the DC name from the
> hostname or ip address of our on-prem hardware. That’s especially true when
> we are migrating to new hardware or OS and we are adding logical DCs with
> different names. I suppose it could be embedded in the ansible host file
> (but you are still maintaining that master file), but we don’t organize our
> hosts file that way. We are rarely adding a few nodes here or there, so the
> penalty of a rolling restart is minimal for us.
>
>
>
> Sean R. Durity
>
>
>
> INTERNAL USE
>
> *From:* Bowen Song <bo...@bso.ng>
> *Sent:* Thursday, June 2, 2022 12:25 PM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Topology vs RackDC
>
>
>
> It really depends on how do you manage your nodes. With automation tools,
> like Ansible, it's much easier to manage the rackdc file per node. The
> "master list" doesn't need to exist, because the file is written once and
> will never get updated. The automation tool will create nodes based on the
> required DC/rack, and writes that information to the rackdc file during the
> node provisioning process. It's much faster to add nodes to a large cluster
> with rackdc file  - no rolling restart required.
>
> On 02/06/2022 14:46, Durity, Sean R wrote:
>
> I agree with Marc. We use the cassandra-topology.properties file (and
> PropertyFileSnitch) for our deployments. Having a file different on every
> node has never made sense to me. There would still have to be some master
> file somewhere from which to generate that individual node file. There is
> the (slight) penalty that a change in topology requires the distribution of
> a new file and a rolling restart.
>
>
>
> Long live the PropertyFileSnitch! 😉
>
>
>
> Sean R. Durity
>
> *From:* Paulo Motta <pa...@gmail.com> <pa...@gmail.com>
> *Sent:* Thursday, June 2, 2022 8:59 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Topology vs RackDC
>
>
>
> It think topology file is better for static clusters, while rackdc for
> dynamic clusters where users can add/remove hosts without needing to update
> the topology file on all hosts.
>
>
>
> On Thu, 2 Jun 2022 at 09:13 Marc Hoppins <ma...@eset.com> wrote:
>
> Hi all,
>
> Why is RACKDC preferred for production than TOPOLOGY?
>
> Surely one common file is far simpler to distribute than deal with the
> mucky-muck of various configs for each host if they are in one rack or
> another and/or one datacentre or another?  It is also fairly
> self-documenting of the setup with the entire cluster there in one file.
>
> From what I read in the documentation, regardless of which snitch one
> implements, cassandra-topology.properties will get read, either as a
> primary or as a backup...so why not just use topology for ALL cases?
>
> Thanks
>
> Marc
>
>
>
> INTERNAL USE
>
>

-- 
This email, including its contents and any attachment(s), may contain 
confidential and/or proprietary information and is solely for the review 
and use of the intended recipient(s). If you have received this email in 
error, please notify the sender and permanently delete this email, its 
content, and any attachment(s).  Any disclosure, copying, or taking of any 
action in reliance on an email received in error is strictly prohibited.

RE: Topology vs RackDC

Posted by Marc Hoppins <ma...@eset.com>.
There are cases supporting both sides. I can see the benefits of the more dynamic setup.

However, how do you ansible/automate when you have multiple switches in 2 or more datacentres and all your nodes are in the same VLAN or VLANs? This is the sticking point which I am trying to get to the bottom of.  I am not fully au fait with Ansible and we are also using Ans. Tower which allows for more flexibility so here should be some practical options.

From: Durity, Sean R <SE...@homedepot.com>
Sent: Thursday, June 2, 2022 7:04 PM
To: user@cassandra.apache.org
Subject: RE: Topology vs RackDC

EXTERNAL
I agree; it does depend. Our ansible could not infer the DC name from the hostname or ip address of our on-prem hardware. That’s especially true when we are migrating to new hardware or OS and we are adding logical DCs with different names. I suppose it could be embedded in the ansible host file (but you are still maintaining that master file), but we don’t organize our hosts file that way. We are rarely adding a few nodes here or there, so the penalty of a rolling restart is minimal for us.

Sean R. Durity


INTERNAL USE
From: Bowen Song <bo...@bso.ng>>
Sent: Thursday, June 2, 2022 12:25 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: [EXTERNAL] Re: Topology vs RackDC


It really depends on how do you manage your nodes. With automation tools, like Ansible, it's much easier to manage the rackdc file per node. The "master list" doesn't need to exist, because the file is written once and will never get updated. The automation tool will create nodes based on the required DC/rack, and writes that information to the rackdc file during the node provisioning process. It's much faster to add nodes to a large cluster with rackdc file  - no rolling restart required.
On 02/06/2022 14:46, Durity, Sean R wrote:
I agree with Marc. We use the cassandra-topology.properties file (and PropertyFileSnitch) for our deployments. Having a file different on every node has never made sense to me. There would still have to be some master file somewhere from which to generate that individual node file. There is the (slight) penalty that a change in topology requires the distribution of a new file and a rolling restart.

Long live the PropertyFileSnitch! 😉

Sean R. Durity
From: Paulo Motta <pa...@gmail.com>
Sent: Thursday, June 2, 2022 8:59 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: [EXTERNAL] Re: Topology vs RackDC

It think topology file is better for static clusters, while rackdc for dynamic clusters where users can add/remove hosts without needing to update the topology file on all hosts.

On Thu, 2 Jun 2022 at 09:13 Marc Hoppins <ma...@eset.com>> wrote:
Hi all,

Why is RACKDC preferred for production than TOPOLOGY?

Surely one common file is far simpler to distribute than deal with the mucky-muck of various configs for each host if they are in one rack or another and/or one datacentre or another?  It is also fairly self-documenting of the setup with the entire cluster there in one file.

From what I read in the documentation, regardless of which snitch one implements, cassandra-topology.properties will get read, either as a primary or as a backup...so why not just use topology for ALL cases?

Thanks

Marc


INTERNAL USE

RE: Topology vs RackDC

Posted by "Durity, Sean R" <SE...@homedepot.com>.
I agree; it does depend. Our ansible could not infer the DC name from the hostname or ip address of our on-prem hardware. That’s especially true when we are migrating to new hardware or OS and we are adding logical DCs with different names. I suppose it could be embedded in the ansible host file (but you are still maintaining that master file), but we don’t organize our hosts file that way. We are rarely adding a few nodes here or there, so the penalty of a rolling restart is minimal for us.

Sean R. Durity


INTERNAL USE
From: Bowen Song <bo...@bso.ng>
Sent: Thursday, June 2, 2022 12:25 PM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Topology vs RackDC


It really depends on how do you manage your nodes. With automation tools, like Ansible, it's much easier to manage the rackdc file per node. The "master list" doesn't need to exist, because the file is written once and will never get updated. The automation tool will create nodes based on the required DC/rack, and writes that information to the rackdc file during the node provisioning process. It's much faster to add nodes to a large cluster with rackdc file  - no rolling restart required.
On 02/06/2022 14:46, Durity, Sean R wrote:
I agree with Marc. We use the cassandra-topology.properties file (and PropertyFileSnitch) for our deployments. Having a file different on every node has never made sense to me. There would still have to be some master file somewhere from which to generate that individual node file. There is the (slight) penalty that a change in topology requires the distribution of a new file and a rolling restart.

Long live the PropertyFileSnitch! 😉

Sean R. Durity
From: Paulo Motta <pa...@gmail.com>
Sent: Thursday, June 2, 2022 8:59 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: [EXTERNAL] Re: Topology vs RackDC

It think topology file is better for static clusters, while rackdc for dynamic clusters where users can add/remove hosts without needing to update the topology file on all hosts.

On Thu, 2 Jun 2022 at 09:13 Marc Hoppins <ma...@eset.com>> wrote:
Hi all,

Why is RACKDC preferred for production than TOPOLOGY?

Surely one common file is far simpler to distribute than deal with the mucky-muck of various configs for each host if they are in one rack or another and/or one datacentre or another?  It is also fairly self-documenting of the setup with the entire cluster there in one file.

From what I read in the documentation, regardless of which snitch one implements, cassandra-topology.properties will get read, either as a primary or as a backup...so why not just use topology for ALL cases?

Thanks

Marc


INTERNAL USE

Re: Topology vs RackDC

Posted by Bowen Song <bo...@bso.ng>.
It really depends on how do you manage your nodes. With automation 
tools, like Ansible, it's much easier to manage the rackdc file per 
node. The "master list" doesn't need to exist, because the file is 
written once and will never get updated. The automation tool will create 
nodes based on the required DC/rack, and writes that information to the 
rackdc file during the node provisioning process. It's much faster to 
add nodes to a large cluster with rackdc file  - no rolling restart 
required.

On 02/06/2022 14:46, Durity, Sean R wrote:
>
> I agree with Marc. We use the cassandra-topology.properties file (and 
> PropertyFileSnitch) for our deployments. Having a file different on 
> every node has never made sense to me. There would still have to be 
> some master file somewhere from which to generate that individual node 
> file. There is the (slight) penalty that a change in topology requires 
> the distribution of a new file and a rolling restart.
>
> Long live the PropertyFileSnitch! 😉
>
> Sean R. Durity
>
> *From:* Paulo Motta <pa...@gmail.com>
> *Sent:* Thursday, June 2, 2022 8:59 AM
> *To:* user@cassandra.apache.org
> *Subject:* [EXTERNAL] Re: Topology vs RackDC
>
> It think topology file is better for static clusters, while rackdc for 
> dynamic clusters where users can add/remove hosts without needing to 
> update the topology file on all hosts.
>
> On Thu, 2 Jun 2022 at 09:13 Marc Hoppins <ma...@eset.com> wrote:
>
>     Hi all,
>
>     Why is RACKDC preferred for production than TOPOLOGY?
>
>     Surely one common file is far simpler to distribute than deal with
>     the mucky-muck of various configs for each host if they are in one
>     rack or another and/or one datacentre or another?  It is also
>     fairly self-documenting of the setup with the entire cluster there
>     in one file.
>
>     From what I read in the documentation, regardless of which snitch
>     one implements, cassandra-topology.properties will get read,
>     either as a primary or as a backup...so why not just use topology
>     for ALL cases?
>
>     Thanks
>
>     Marc
>
>     INTERNAL USE
>

RE: Topology vs RackDC

Posted by "Durity, Sean R" <SE...@homedepot.com>.
I agree with Marc. We use the cassandra-topology.properties file (and PropertyFileSnitch) for our deployments. Having a file different on every node has never made sense to me. There would still have to be some master file somewhere from which to generate that individual node file. There is the (slight) penalty that a change in topology requires the distribution of a new file and a rolling restart.

Long live the PropertyFileSnitch! 😉

Sean R. Durity
From: Paulo Motta <pa...@gmail.com>
Sent: Thursday, June 2, 2022 8:59 AM
To: user@cassandra.apache.org
Subject: [EXTERNAL] Re: Topology vs RackDC

It think topology file is better for static clusters, while rackdc for dynamic clusters where users can add/remove hosts without needing to update the topology file on all hosts.

On Thu, 2 Jun 2022 at 09:13 Marc Hoppins <ma...@eset.com>> wrote:
Hi all,

Why is RACKDC preferred for production than TOPOLOGY?

Surely one common file is far simpler to distribute than deal with the mucky-muck of various configs for each host if they are in one rack or another and/or one datacentre or another?  It is also fairly self-documenting of the setup with the entire cluster there in one file.

From what I read in the documentation, regardless of which snitch one implements, cassandra-topology.properties will get read, either as a primary or as a backup...so why not just use topology for ALL cases?

Thanks

Marc


INTERNAL USE

Re: Topology vs RackDC

Posted by Aaron Ploetz <aa...@gmail.com>.
Agree with Paulo on this one.

If you're running a cluster on K8s, in the cloud, or under some other
conditions in which IPs change, the cassandra-topology.properties file is
going to quickly become a burden.  Especially if that cluster has more than
just a few nodes.

Increasing your cluster from 99 nodes to 120?  Need to make sure that
updated file ends up on all 120 nodes, with all 120 IPs.  Also, I'm pretty
sure that a node can't read the new file without a restart.  So now you're
talking about bouncing a 100+ node cluster just to add new nodes.  Same
with replacing a single node or IP.  Update the file, push it out to all
nodes, and restart all nodes.

Basically, the cassandra-rackdc.properties file is preferred because it
makes operations easier at scale.  Config files that change less make
everyone happier!

At my last gig, we built the propagation of the cassandra-rackdc.properties
file into our node-add automation.  That way it looks at the target
instance, and already knows which DC and rack to inject into the file.

Thanks,

Aaron

P.S. - When using the cassandra-rackdc.properties file, the prevailing
production wisdom has always been to delete the
cassandra-topology.properties file to keep it from interfering.  Not sure
if that's still an issue or not, but it was in the 3.x versions.


On Thu, Jun 2, 2022 at 7:59 AM Paulo Motta <pa...@gmail.com> wrote:

> It think topology file is better for static clusters, while rackdc for
> dynamic clusters where users can add/remove hosts without needing to update
> the topology file on all hosts.
>
> On Thu, 2 Jun 2022 at 09:13 Marc Hoppins <ma...@eset.com> wrote:
>
>> Hi all,
>>
>> Why is RACKDC preferred for production than TOPOLOGY?
>>
>> Surely one common file is far simpler to distribute than deal with the
>> mucky-muck of various configs for each host if they are in one rack or
>> another and/or one datacentre or another?  It is also fairly
>> self-documenting of the setup with the entire cluster there in one file.
>>
>> From what I read in the documentation, regardless of which snitch one
>> implements, cassandra-topology.properties will get read, either as a
>> primary or as a backup...so why not just use topology for ALL cases?
>>
>> Thanks
>>
>> Marc
>>
>

Re: Topology vs RackDC

Posted by Paulo Motta <pa...@gmail.com>.
It think topology file is better for static clusters, while rackdc for
dynamic clusters where users can add/remove hosts without needing to update
the topology file on all hosts.

On Thu, 2 Jun 2022 at 09:13 Marc Hoppins <ma...@eset.com> wrote:

> Hi all,
>
> Why is RACKDC preferred for production than TOPOLOGY?
>
> Surely one common file is far simpler to distribute than deal with the
> mucky-muck of various configs for each host if they are in one rack or
> another and/or one datacentre or another?  It is also fairly
> self-documenting of the setup with the entire cluster there in one file.
>
> From what I read in the documentation, regardless of which snitch one
> implements, cassandra-topology.properties will get read, either as a
> primary or as a backup...so why not just use topology for ALL cases?
>
> Thanks
>
> Marc
>