You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Felipe Esteves <fe...@b2wdigital.com> on 2016/02/26 20:48:14 UTC

Nodetool Rebuild sending few big packets of data. Is it normal?

Hi,

I'm running a nodetool rebuild to include a new DC in my cluster.
My config is:
DC1, 2 nodes per rack (2 racks), 70gb each node
DC2, 2 nodes per rack (1 rack), 90gb each node
DC3, 2 nodes per rack (1 rack) (*THIS IS THE NEW DC*)

What I did was get the 2 nodes in DC3 up and running with bootstrap=false,
and then ran a rebuild using DC2 as a parameter.

However, when I started, the load in both new nodes rapidly increased to
1.4GB, according to nodetool status. And then it was slowly increasing for
4 hours, in a 10mb basis. Then, suddenly, 1 node had 49.5GB and the other
followed soon.
In the instance logs, I have only stream messages from when I've started
the rebuild.

My point is, is it normal to Cassandra accumulate this amount of data and
then send it? I was hoping that it was more of a gradual and incremental
proccess.

thanks,

Felipe Esteves

Tecnologia

felipe.esteves@b2wdigital.com <se...@b2wdigital.com>

Tel.: (21) 3504-7162 ramal 57162

-- 


Re: Nodetool Rebuild sending few big packets of data. Is it normal?

Posted by Felipe Esteves <fe...@b2wdigital.com>.
Hi Jeff,

Thanks for the info, you're right!

Felipe Esteves

Tecnologia

felipe.esteves@b2wdigital.com <se...@b2wdigital.com>

Tel.: (21) 3504-7162 ramal 57162

2016-02-26 17:38 GMT-03:00 Jeff Jirsa <je...@crowdstrike.com>:

> Cassandra is streaming it at a near constant rate (if you had metrics for
> network interface, you’d probably see that), but it doesn’t register in
> nodetool status until it completes all of the sstables for a column family.
> At that point, the -tmp–Data.db files get renamed to drop the –tmp, and
> they become live on the node.
>
> I suspect you have a table/CF that’s approximately 47/48gb, and it
> completed, and it’s size in nodetool status jumped at that time.
>
>
>
> From: Felipe Esteves
> Reply-To: "user@cassandra.apache.org"
> Date: Friday, February 26, 2016 at 11:48 AM
> To: "user@cassandra.apache.org"
> Subject: Nodetool Rebuild sending few big packets of data. Is it normal?
>
> Hi,
>
> I'm running a nodetool rebuild to include a new DC in my cluster.
> My config is:
> DC1, 2 nodes per rack (2 racks), 70gb each node
> DC2, 2 nodes per rack (1 rack), 90gb each node
> DC3, 2 nodes per rack (1 rack) (*THIS IS THE NEW DC*)
>
> What I did was get the 2 nodes in DC3 up and running with bootstrap=false,
> and then ran a rebuild using DC2 as a parameter.
>
> However, when I started, the load in both new nodes rapidly increased to
> 1.4GB, according to nodetool status. And then it was slowly increasing for
> 4 hours, in a 10mb basis. Then, suddenly, 1 node had 49.5GB and the other
> followed soon.
> In the instance logs, I have only stream messages from when I've started
> the rebuild.
>
> My point is, is it normal to Cassandra accumulate this amount of data and
> then send it? I was hoping that it was more of a gradual and incremental
> proccess.
>
> thanks,
>
> Felipe Esteves
>
> Tecnologia
>
> felipe.esteves@b2wdigital.com <se...@b2wdigital.com>
>
> Tel.: (21) 3504-7162 ramal 57162
>
>
>
>
>

-- 


Re: Nodetool Rebuild sending few big packets of data. Is it normal?

Posted by Jeff Jirsa <je...@crowdstrike.com>.
Cassandra is streaming it at a near constant rate (if you had metrics for network interface, you’d probably see that), but it doesn’t register in nodetool status until it completes all of the sstables for a column family. At that point, the -tmp–Data.db files get renamed to drop the –tmp, and they become live on the node.

I suspect you have a table/CF that’s approximately 47/48gb, and it completed, and it’s size in nodetool status jumped at that time.



From:  Felipe Esteves
Reply-To:  "user@cassandra.apache.org"
Date:  Friday, February 26, 2016 at 11:48 AM
To:  "user@cassandra.apache.org"
Subject:  Nodetool Rebuild sending few big packets of data. Is it normal?

Hi, 

I'm running a nodetool rebuild to include a new DC in my cluster.
My config is:
DC1, 2 nodes per rack (2 racks), 70gb each node
DC2, 2 nodes per rack (1 rack), 90gb each node
DC3, 2 nodes per rack (1 rack) (THIS IS THE NEW DC)

What I did was get the 2 nodes in DC3 up and running with bootstrap=false, and then ran a rebuild using DC2 as a parameter.

However, when I started, the load in both new nodes rapidly increased to 1.4GB, according to nodetool status. And then it was slowly increasing for 4 hours, in a 10mb basis. Then, suddenly, 1 node had 49.5GB and the other followed soon.
In the instance logs, I have only stream messages from when I've started the rebuild.

My point is, is it normal to Cassandra accumulate this amount of data and then send it? I was hoping that it was more of a gradual and incremental proccess.

thanks,

Felipe Esteves

Tecnologia

felipe.esteves@b2wdigital.com

Tel.: (21) 3504-7162 ramal 57162