You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Philipp Potisk <ph...@geroba.at> on 2014/06/10 23:21:10 UTC
StreamException while adding nodes
Hi,
I tried to double the size of an existing cluster from 4 to 8 nodes. First
I added one node, which joined after 120min successfully. During that time
there was no additional load on the cluster. Afterwards I started the other
3 new nodes after each other in order to join the cluster simultaneously.
Furthermore I put some write-load on the cluster. After 45min of the
process 2 nodes died with following exception.
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
at
com.google.common.util.concurrent.Futures$4.run(Futures.java:1160)
Since I have restarted Cassandra on the failing nodes (8 hours ago), the 3
nodes remain in status JOINING, but there is no data exchange going on any
more.
Furthermore, nodetool info throws the exception:
Exception in thread "main" java.lang.AssertionError
at
org.apache.cassandra.locator.TokenMetadata.getTokens(TokenMetadata.java:502)
at
org.apache.cassandra.service.StorageService.getTokens(StorageService.java:2132)
which corresponds to isMember returning FALSE.
public Collection<Token> getTokens(InetAddress endpoint)
{
assert endpoint != null;
assert isMember(endpoint);
My questions right now are:
- What could have caused the streaming error?
- Shouldn't nodes be added while there is some load on the cluster? OS load
was between 2 and 6 on a dual core machine.
- Would it have been better to add the 3 new nodes one by one, rather than
simultaneously?
- How should I proceed with the 3 half joined nodes as they are not willing
to exchange the missing data?
We are using, Cassandra 2.0.7 (vnodes and broadly the default config) and
RF 2, with each node having roughly 17 GB of data on it.
Thanks for any hints,
Phil
Re: StreamException while adding nodes
Posted by Philipp Potisk <ph...@omnecon.com>.
As we are still failing to add the 3 additional nodes, we still appreciate
any further thoughts.
I have removed all 3 half-joined nodes, deleted the data-directories and
started only one node. Since than (more than 24h hoursa ago) the node is in
status JOINING (nodetool status: UJ, nodetool gossipinfo:
STATUS:BOOT,-7774403902045887560) but does not receive any data.
nodetool status shows that only 5,72MB has arrived so far:
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns (effective) Host
ID Token
Rack
UJ 10.140.118.4 5.72 MB ?
dc110f47-67b0-40c9-bef7-3dff59bfe29c
-9201583989361968764 rack1
UN 10.53.186.53 29.59 GB 43.1%
80cb0036-33b9-4c37-b789-7dac340034ee
-9137279293977023905 rack1
UN 10.140.120.27 25.27 GB 37.8%
2564094b-08ea-42c4-82b0-a8246bd3ebcf
-9201237785760477995 rack1
UN 10.53.170.3 26.82 GB 38.1%
737f49e5-684f-46ef-bf8b-c82326128835
-9106630210265624873 rack1
UN 10.140.104.105 27.88 GB 39.7%
18c74472-235d-4284-9906-0ab8cc40011d
-9213643688261125087 rack1
UN 10.53.170.41 26.28 GB 41.3%
866d2276-0dac-41b3-aece-6a2711ef0234
-9031518559431277310 rack1
Furthermore it is very strange that nodetool describering, does not have
the IP of the new node included in the endpoints-list. Command:
nodetool describering TransactionUseCaseAddNodes | grep 10.140.118.4
does not output anything.
It seems that no token-ranges are assigned to this node. However, according
the documentation regarding vnodes, rebalancing should be done
automatically.
Is there still a way to force rebalancing in Cassandra 2.X using vnodes? Or
is there something else I could look into?
On 11 June 2014 08:26, Philipp Potisk <ph...@omnecon.com> wrote:
> Hey Rob,
>
> thanks for pointing out the issue with simultaneous bootstraps. However, I
> am not sure if this applies in my case. As a matter of fact I did not start
> the nodes simultaneously - I waited about 10min until they were receiving
> streams from other nodes. So I guess the topology-changes were exchanged as
> expected. Only the joining of the 3 nodes was done simultaneously.
> The StreamException, which killed the process, also happened in a later
> point of time. Since than the nodes are not picking up the join-process
> again. I am now thinking of decommissioning and staring all over again.
>
> Phil
>
>
> On 11 June 2014 03:13, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Tue, Jun 10, 2014 at 2:21 PM, Philipp Potisk <philipp.potisk@geroba.at
>> > wrote:
>>
>>> First I added one node, which joined after 120min successfully. During
>>> that time there was no additional load on the cluster. Afterwards I started
>>> the other 3 new nodes after each other in order to join the cluster
>>> simultaneously.
>>>
>>
>> Bootstrapping multiple nodes at once is now and has always been Not
>> Supported, but is such a common thing for new operators to try that there
>> is now a goal to prevent them from doing it [1].
>>
>> Cancel those simultaneous bootstraps and do them one at a time, and
>> they'll probably work.
>>
>> [1] https://issues.apache.org/jira/browse/CASSANDRA-7069
>>
>> =Rob
>>
>
>
>
> --
> DI Philipp Potisk
>
> Omnecon IT e.U.
>
> Klabundgasse 5-7/3/17
> 1190 Wien
>
> Tel.: +43 660 46 02 632
> E-Mail.: philipp.potisk@omnecon.com
>
> Firmenbuchnummer: FN 342255 t
> UID: ATU65503966
>
--
DI Philipp Potisk
Omnecon IT e.U.
Klabundgasse 5-7/3/17
1190 Wien
Tel.: +43 660 46 02 632
E-Mail.: philipp.potisk@omnecon.com
Firmenbuchnummer: FN 342255 t
UID: ATU65503966
Re: StreamException while adding nodes
Posted by Philipp Potisk <ph...@omnecon.com>.
Hey Rob,
thanks for pointing out the issue with simultaneous bootstraps. However, I
am not sure if this applies in my case. As a matter of fact I did not start
the nodes simultaneously - I waited about 10min until they were receiving
streams from other nodes. So I guess the topology-changes were exchanged as
expected. Only the joining of the 3 nodes was done simultaneously.
The StreamException, which killed the process, also happened in a later
point of time. Since than the nodes are not picking up the join-process
again. I am now thinking of decommissioning and staring all over again.
Phil
On 11 June 2014 03:13, Robert Coli <rc...@eventbrite.com> wrote:
> On Tue, Jun 10, 2014 at 2:21 PM, Philipp Potisk <ph...@geroba.at>
> wrote:
>
>> First I added one node, which joined after 120min successfully. During
>> that time there was no additional load on the cluster. Afterwards I started
>> the other 3 new nodes after each other in order to join the cluster
>> simultaneously.
>>
>
> Bootstrapping multiple nodes at once is now and has always been Not
> Supported, but is such a common thing for new operators to try that there
> is now a goal to prevent them from doing it [1].
>
> Cancel those simultaneous bootstraps and do them one at a time, and
> they'll probably work.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-7069
>
> =Rob
>
--
DI Philipp Potisk
Omnecon IT e.U.
Klabundgasse 5-7/3/17
1190 Wien
Tel.: +43 660 46 02 632
E-Mail.: philipp.potisk@omnecon.com
Firmenbuchnummer: FN 342255 t
UID: ATU65503966
Re: StreamException while adding nodes
Posted by Robert Coli <rc...@eventbrite.com>.
On Tue, Jun 10, 2014 at 2:21 PM, Philipp Potisk <ph...@geroba.at>
wrote:
> First I added one node, which joined after 120min successfully. During
> that time there was no additional load on the cluster. Afterwards I started
> the other 3 new nodes after each other in order to join the cluster
> simultaneously.
>
Bootstrapping multiple nodes at once is now and has always been Not
Supported, but is such a common thing for new operators to try that there
is now a goal to prevent them from doing it [1].
Cancel those simultaneous bootstraps and do them one at a time, and they'll
probably work.
[1] https://issues.apache.org/jira/browse/CASSANDRA-7069
=Rob