You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Anubhav Kale <An...@microsoft.com> on 2016/06/16 17:43:56 UTC

StreamCoordinator.ConnectionsPerHost set to 1

Hello,

I noticed that StreamCoordinator.ConnectionsPerHost is always set to 1 (Cassandra 2.1.13). If I am reading the code correctly, this means there will always be just one socket (well, 2 technically for each direction) between nodes when rebuilding thus the data will always be serialized.

Have folks experimented with increasing this ? It appears that some parallelism here might help rebuilds in a significant way assuming we aren't hitting bandwidth caps (it's a pain for us at the moment to rebuild nodes holding 500GB).

I'm going to try to patch our cluster with a change to test this out, but wanted to hear from experts as well.

Thanks !

RE: StreamCoordinator.ConnectionsPerHost set to 1

Posted by Anubhav Kale <An...@microsoft.com>.
Thanks Paulo. I made some changes along those lines, and seeing good improvement. I will discuss further (with a possible patch) on https://issues.apache.org/jira/browse/CASSANDRA-4663 (this is for bootstrap, so maybe we can repurpose it for rebuilds or create a separate one).

From: Paulo Motta [mailto:pauloricardomg@gmail.com]
Sent: Thursday, June 16, 2016 3:06 PM
To: user@cassandra.apache.org
Subject: Re: StreamCoordinator.ConnectionsPerHost set to 1

Increasing the number of threads alone won't help, because you need to add connectionsPerHost-awareness to StreamPlan.requestRanges (otherwise only a single connection per host is created) similar to what was done to StreamPlan.transferFiles by CASSANDRA-3668, but maybe bit trickier. There's an open ticket to support that on CASSANDRA-4663
There's also another discussion on improving rebuild parallelism on CASSANDRA-12015.

2016-06-16 14:43 GMT-03:00 Anubhav Kale <An...@microsoft.com>>:
Hello,

I noticed that StreamCoordinator.ConnectionsPerHost is always set to 1 (Cassandra 2.1.13). If I am reading the code correctly, this means there will always be just one socket (well, 2 technically for each direction) between nodes when rebuilding thus the data will always be serialized.

Have folks experimented with increasing this ? It appears that some parallelism here might help rebuilds in a significant way assuming we aren’t hitting bandwidth caps (it’s a pain for us at the moment to rebuild nodes holding 500GB).

I’m going to try to patch our cluster with a change to test this out, but wanted to hear from experts as well.

Thanks !


Re: StreamCoordinator.ConnectionsPerHost set to 1

Posted by Paulo Motta <pa...@gmail.com>.
Increasing the number of threads alone won't help, because you need to add
connectionsPerHost-awareness to StreamPlan.requestRanges (otherwise only a
single connection per host is created) similar to what was done to
StreamPlan.transferFiles by CASSANDRA-3668, but maybe bit trickier. There's
an open ticket to support that on CASSANDRA-4663

There's also another discussion on improving rebuild parallelism on
CASSANDRA-12015.

2016-06-16 14:43 GMT-03:00 Anubhav Kale <An...@microsoft.com>:

> Hello,
>
>
>
> I noticed that StreamCoordinator.ConnectionsPerHost is always set to 1
> (Cassandra 2.1.13). If I am reading the code correctly, this means there
> will always be just one socket (well, 2 technically for each direction)
> between nodes when rebuilding thus the data will always be serialized.
>
>
>
> Have folks experimented with increasing this ? It appears that some
> parallelism here might help rebuilds in a significant way assuming we
> aren’t hitting bandwidth caps (it’s a pain for us at the moment to rebuild
> nodes holding 500GB).
>
>
>
> I’m going to try to patch our cluster with a change to test this out, but
> wanted to hear from experts as well.
>
>
>
> Thanks !
>