You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Anubhav Kale <An...@microsoft.com> on 2016/10/13 14:57:41 UTC

New node overstreaming data ?

Hello,

We run 2.1.13 and seeing an odd issue. A node went down, and stayed down for a while so it went out of gossip. When we try to bootstrap it again (as a new node), it overstreams from other nodes, eventually disk becomes full and crashes. This repeated 3 times.

Does anyone have any insights on what to try next (both in terms of root causing, and working around) ? To work around, we tried increasing #compactors and reducing stream throughput so that at least incoming #SSTables would be controlled.

This has happened to us few times in the past too, so I am wondering if this is a known problem (I couldn't find any JIRAs).

Thanks !

Re: New node overstreaming data ?

Posted by Ben Bromhead <be...@instaclustr.com>.

Over streaming is pretty common, especially with vnodes in 2.x. When
Cassandra streams data to a bootstrapping node it sends the entire SSTable
that contains the data the new node requires even if that table might only
have 1 row out of thousands. This can be exacerbated by STCS with large
SSTables.

Generally reducing the network streaming throughput and increasing
concurrent_compactors (and un-throttling compaction throughput) is the way
to go. If you are in the cloud (e.g. AWS) you can also attach large block
store volumes (EBS) to the instance to act as overflow.

There is some ongoing work in Cassandra 3.x that will make streaming more
efficient and allow Cassandra to only stream portions of the SSTable that
are required.

On Thu, 13 Oct 2016 at 07:57 Anubhav Kale <An...@microsoft.com>
wrote:

> Hello,
>
>
>
> We run 2.1.13 and seeing an odd issue. A node went down, and stayed down
> for a while so it went out of gossip. When we try to bootstrap it again (as
> a new node), it overstreams from other nodes, eventually disk becomes full
> and crashes. This repeated 3 times.
>
>
>
> Does anyone have any insights on what to try next (both in terms of root
> causing, and working around) ? To work around, we tried increasing
> #compactors and reducing stream throughput so that at least incoming
> #SSTables would be controlled.
>
>
>
> This has happened to us few times in the past too, so I am wondering if
> this is a known problem (I couldn’t find any JIRAs).
>
>
>
> Thanks !
>
-- 
Ben Bromhead
CTO | Instaclustr <https://www.instaclustr.com/>
+1 650 284 9692
Managed Cassandra / Spark on AWS, Azure and Softlayer