You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by James Vanns <jv...@gmail.com> on 2015/07/30 18:23:49 UTC

AWS multi-region DCs fail to rebuild

Hi. First, some details;

* Ubuntu 14.04.2 LTS.
* Oracle Java 8
* Cassandra 2.2 (from datastax repo)
* AWS VPC - two regions (Oregon, Ireland)
* A pair of 3 node DCs in a single cluster - 1 DC per region as above
* Ec2Snitch (NOT the Ec2MultiRegionSnitch - does not work in a VPC
environment)

In following this documentation;

http://docs.datastax.com/en/cassandra/2.2/cassandra/operations/opsAddDCToCluster.html

The rebuild (final) stage fails with this message;

error: Error while rebuilding node: Stream failed
-- StackTrace --
java.lang.RuntimeException: Error while rebuilding node: Stream failed
        at
org.apache.cassandra.service.StorageService.rebuild(StorageService.java:1109)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
....

Obviously there is a much larger stack trace.

This happens repeatedly when attempting to run the rebuild on just a single
node
in the US DC (pointing at the EU DC). I have not yet tried any other node
from the
US DC.

Is this a bug or a configuration error perhaps? I know people out there are
using
AWS for Cassandra - how are you replicating across regions? Here are two
values I've tried modifying to no avail;

streaming_socket_timeout_in_ms
phi_convict_threshold

As both were referenced in various AWS related Cassandra sources on the web
;)

The amount of data being replicated would be tiny - we're testing a tiny
TitanDB
graph of  no more than 100 edges and 100 nodes.

Can anyone point me in the direction of a correct solution and explanation?

Cheers,

Jim

--
Senior Code Pig
Industrial Light & Magic

Re: AWS multi-region DCs fail to rebuild

Posted by Nate McCall <na...@thelastpickle.com>.
>
>
> This happens repeatedly when attempting to run the rebuild on just a
> single node
> in the US DC (pointing at the EU DC). I have not yet tried any other node
> from the
> US DC.
>
> Is this a bug or a configuration error perhaps? I know people out there
> are using
> AWS for Cassandra - how are you replicating across regions?
>

There have been some edge cases here in the past:
https://issues.apache.org/jira/browse/CASSANDRA-4026

Check the AWS metadata (
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html)
to see that what the region and AZ endpoints return is consistent with your
keyspace declaration.

If nothing really sticks out, you may want to try just using GPFS and fall
back to setting DC and rack by hand.

Per Rob's point about bleeding edge, I'd be super curious if the existing
setup worked as is on 2.1 or 2.0. I'd be willing to bet you are the first
person trying to make EC2Snitch span regions on 2.2.


-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: AWS multi-region DCs fail to rebuild

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
If all the streams are failing it might be due to your VPN tunnel linking
the 2 DC or to some port issue, make sur this is ok (Cassandra uses port
7000 for internal communication by default).

I am using this exact same setup and never had streaming issues, excepted
during some big repairs. I would say AWS is fine regarding streams, at
least to me (I also use EC2Snitch).

What does a nodetool status show ?

Your issue is really weird. If this is only a test (not production) as I
think it is, maybe should you go for a 2.1.latest version and see what
happen.

2015-07-31 10:54 GMT+02:00 James Vanns <jv...@gmail.com>:

>
>>
>> Why are you using 2.2? In general Cassandra, like most software, can lack
>> stability near the bleeding edge.
>>
>>
> I'll be honest, I didn't realise I was using the bleeding edge! I had
> begun playing around with dsc21 (from the datastax apt repo) but when that
> failed to install correctly due to dependency issues with the packaging, I
> switched to dsc22 as this fixed the packaging problem. I *assumed* that
> dsc22 was an upgraded official release :|
>
>
>> error: Error while rebuilding node: Stream failed
>>>
>>
>> This is a relatively common issue with streaming in AWS.
>>
>>
> Oh :(
>
>
>> Is this a bug or a configuration error perhaps? I know people out there
>>> are using
>>> AWS for Cassandra - how are you replicating across regions? Here are two
>>> values I've tried modifying to no avail;
>>>
>>> streaming_socket_timeout_in_ms
>>> phi_convict_threshold
>>>
>>
>> What values did you try? What happened when you did this?
>>
>
> Yes, that would have been helpful I suppose ;) I increased
> phi_convict_threshold from 10 to 12 and set streaming_socket_timeout_in_ms
> to 5000. I applied these independently before together. Neither helped
> as the same error was reported.
>
>
>>
> As both were referenced in various AWS related Cassandra sources on the
>>> web ;)
>>>
>>> The amount of data being replicated would be tiny - we're testing a tiny
>>> TitanDB
>>> graph of  no more than 100 edges and 100 nodes.
>>>
>>
>> With data that small, you don't really need to use "rebuild," you could
>> just bootstrap the nodes one at a time. Of course bootstrap uses the same
>> streaming as rebuild so that doesn't really help, lol..
>>
>
> Thanks for your help.
>
>
>
>> For issues like this, #cassandra on freenode IRC is often a good forum to
>> get help.
>>
>> =Rob
>>
>>
>
>
> --
> --
> Senior Code Pig
> Industrial Light & Magic
>

Re: AWS multi-region DCs fail to rebuild

Posted by James Vanns <jv...@gmail.com>.
>
>
>
> Why are you using 2.2? In general Cassandra, like most software, can lack
> stability near the bleeding edge.
>
>
I'll be honest, I didn't realise I was using the bleeding edge! I had begun
playing around with dsc21 (from the datastax apt repo) but when that failed
to install correctly due to dependency issues with the packaging, I
switched to dsc22 as this fixed the packaging problem. I *assumed* that
dsc22 was an upgraded official release :|


> error: Error while rebuilding node: Stream failed
>>
>
> This is a relatively common issue with streaming in AWS.
>
>
Oh :(


> Is this a bug or a configuration error perhaps? I know people out there
>> are using
>> AWS for Cassandra - how are you replicating across regions? Here are two
>> values I've tried modifying to no avail;
>>
>> streaming_socket_timeout_in_ms
>> phi_convict_threshold
>>
>
> What values did you try? What happened when you did this?
>

Yes, that would have been helpful I suppose ;) I increased
phi_convict_threshold from 10 to 12 and set streaming_socket_timeout_in_ms
to 5000. I applied these independently before together. Neither helped
as the same error was reported.


>
As both were referenced in various AWS related Cassandra sources on the web
>> ;)
>>
>> The amount of data being replicated would be tiny - we're testing a tiny
>> TitanDB
>> graph of  no more than 100 edges and 100 nodes.
>>
>
> With data that small, you don't really need to use "rebuild," you could
> just bootstrap the nodes one at a time. Of course bootstrap uses the same
> streaming as rebuild so that doesn't really help, lol..
>

Thanks for your help.



> For issues like this, #cassandra on freenode IRC is often a good forum to
> get help.
>
> =Rob
>
>


-- 
--
Senior Code Pig
Industrial Light & Magic

Re: AWS multi-region DCs fail to rebuild

Posted by Robert Coli <rc...@eventbrite.com>.
On Thu, Jul 30, 2015 at 9:23 AM, James Vanns <jv...@gmail.com> wrote:

> Hi. First, some details;
>

Thanks for including the relevant details in your initial post, it helps a
lot.

Why are you using 2.2? In general Cassandra, like most software, can lack
stability near the bleeding edge.

error: Error while rebuilding node: Stream failed
>

This is a relatively common issue with streaming in AWS.


> Is this a bug or a configuration error perhaps? I know people out there
> are using
> AWS for Cassandra - how are you replicating across regions? Here are two
> values I've tried modifying to no avail;
>
> streaming_socket_timeout_in_ms
> phi_convict_threshold
>

What values did you try? What happened when you did this?


> As both were referenced in various AWS related Cassandra sources on the
> web ;)
>
> The amount of data being replicated would be tiny - we're testing a tiny
> TitanDB
> graph of  no more than 100 edges and 100 nodes.
>

With data that small, you don't really need to use "rebuild," you could
just bootstrap the nodes one at a time. Of course bootstrap uses the same
streaming as rebuild so that doesn't really help, lol..

For issues like this, #cassandra on freenode IRC is often a good forum to
get help.

=Rob