You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Peter Fales <Pe...@alcatel-lucent.com> on 2010/09/01 18:26:12 UTC

Re: Cassandra on AWS across Regions

A few months ago, there was a thread on this list about using Cassandra
across multiple EC2 regions.   I was interested in doing in doing 
the same thing, and managed to make it work.

To implement this, there are basically two things that need to change.
First, in storage-conf.xml, I used the "external" IP addresses for
<ListenAddress> and <Seed> - these external address are needed for 
the machines in different regions to talk to each other.   However, they
also work within regions.  

However, that doesn't quite work with the stock Cassandra, as it will
try to bind and listen on those addresses and give up because they
don't appear to be valid network addresses.  This patch causes 
Cassandra to listen on the local network, rather than the <ListenAddress>
defined in the config file.   (This is not a completely general
solution.  It assumes that there is only one local network, and that the
default network is the one to use, but - at least for EC2 - that assumption
should be OK)

Part of my motivation for posting here is to solicit feedback on the 
third part of the patch.   I was able to get my two-region cluster 
up and running by patching just the first two files.   The third
change may be needed under certain conditions, but I never seemed to
hit that code.

Here's the source patch:


diff -ur orig/apache-cassandra-0.6.5-src/src/java/org/apache/cassandra/net/MessagingService.java apache-cassandra-0.6.5-src/src/java/org/apache/cassandra/net/MessagingService.java
--- orig/apache-cassandra-0.6.5-src/src/java/org/apache/cassandra/net/MessagingService.java	2010-08-16 17:48:02.000000000 -0500
+++ apache-cassandra-0.6.5-src/src/java/org/apache/cassandra/net/MessagingService.java	2010-09-01 10:05:34.000000000 -0500
@@ -147,7 +147,16 @@
         ServerSocketChannel serverChannel = ServerSocketChannel.open();
         final ServerSocket ss = serverChannel.socket();
         ss.setReuseAddress(true);
+
+/* OLD 
         ss.bind(new InetSocketAddress(localEp, DatabaseDescriptor.getStoragePort()));
+*/
+	/* In order to allow using Amazon EC2 across regions, we listen
+	 * on our local address, rather rather than the "public" IP address
+	 * defined in storage-conf.xml 
+	 */
+        ss.bind(new InetSocketAddress(InetAddress.getLocalHost(), DatabaseDescriptor.getStoragePort()));
+
         socketThread = new SocketThread(ss, "ACCEPT-" + localEp);
         socketThread.start();
         listenGate.signalAll();
diff -ur orig/apache-cassandra-0.6.5-src/src/java/org/apache/cassandra/net/OutboundTcpConnection.java apache-cassandra-0.6.5-src/src/java/org/apache/cassandra/net/OutboundTcpConnection.java
--- orig/apache-cassandra-0.6.5-src/src/java/org/apache/cassandra/net/OutboundTcpConnection.java	2010-07-27 16:09:18.000000000 -0500
+++ apache-cassandra-0.6.5-src/src/java/org/apache/cassandra/net/OutboundTcpConnection.java	2010-09-01 10:09:31.000000000 -0500
@@ -149,7 +149,16 @@
             try
             {
                 // zero means 'bind on any available port.'
+
+	        /* In order to allow using Amazon EC2 across regions, we 
+		 * listen on our local address, rather rather than the
+		 * "public" IP address defined in storage-conf.xml
+	         */
+
+/* OLD
                 socket = new Socket(endpoint, DatabaseDescriptor.getStoragePort(), FBUtilities.getLocalAddress(), 0);
+*/
+                socket = new Socket(endpoint, DatabaseDescriptor.getStoragePort(), InetAddress.getLocalHost(), 0);
                 socket.setTcpNoDelay(true);
                 output = new DataOutputStream(socket.getOutputStream());
                 return true;
diff -ur orig/apache-cassandra-0.6.5-src/src/java/org/apache/cassandra/net/FileStreamTask.java apache-cassandra-0.6.5-src/src/java/org/apache/cassandra/net/FileStreamTask.java
--- orig/apache-cassandra-0.6.5-src/src/java/org/apache/cassandra/net/FileStreamTask.java	2010-05-28 11:23:04.000000000 -0500
+++ apache-cassandra-0.6.5-src/src/java/org/apache/cassandra/net/FileStreamTask.java	2010-09-01 10:07:43.000000000 -0500
@@ -122,6 +122,14 @@
     {
         SocketChannel channel = SocketChannel.open();
         // force local binding on correctly specified interface.
+
+	/* When using Amazon EC2 "public" IP addresses, we probably
+	 * won't be able to bind to the address.  However, I don't see
+	 * this code getting hit, and I'm not sure under what circumstances
+	 * it would get run.
+	 */
+System.out.println("FIXME - probably can't bind to this address: "+FBUtilities.getLocalAddress()+"\n");
+
         channel.socket().bind(new InetSocketAddress(FBUtilities.getLocalAddress(), 0));
         int attempts = 0;
         while (true)


-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: Peter.Fales@alcatel-lucent.com
Phone: 630 979 8031

Re: Cassandra on AWS across Regions

Posted by Benjamin Black <b...@b3k.us>.
On Thu, Sep 2, 2010 at 5:52 AM, Phil Stanhope <st...@gmail.com> wrote:
> Ben, can you elaborate on some infrastructure topology issues that would
> break this approach?
>

As noted, the naive approach results in nodes behind the same NAT
having to communicate with each other through that NAT rather than
directly.  You can different property files for property snitch on
different nodes, as that is directly encoding topology.  You could do
the same with /etc/hosts.  You could do the same with DNS.  The
problem is that in all these cases you have a different view of the
world depending on where you are.  Does this node have the right
information for connecting to local nodes and remote nodes?  Is it
failing to connect to some other node because of a hostname resolution
failure, or because it has the wrong topology information, or ...?

And this only assumes 1:1 NAT.  What is the solution for PAT (which is
quite common)?  It's a deep dark hole of edge cases.  I would rather
have a dead simple 80% solution than a 100% solution with dynamics I
can't understand.


b

Re: Cassandra on AWS across Regions

Posted by Phil Stanhope <st...@gmail.com>.
Ben, can you elaborate on some infrastructure topology issues that would
break this approach?

On Wed, Sep 1, 2010 at 6:25 PM, Benjamin Black <b...@b3k.us> wrote:

> On Wed, Sep 1, 2010 at 4:16 PM, Andres March <am...@qualcomm.com> wrote:
> > I didn't have anything specific in mind. I understand all the issues
> around
> > DNS and not advocating only supporting hostnames (just thought it would
> be a
> > nice option).  I also wouldn't expect name resolution to be done all the
> > time, only when the node is first being started or during initial
> discovery.
> >
>
> All nodes would have to resolve whenever topology changed.
>
> > One use case might be when nodes are spread out over multiple networks as
> > the poster describes, nodes on the same network on a private interface
> could
> > incur less network overhead than if they go out through the public
> > interface.  I'm not sure that this is even possible given that cassandra
> > binds to only one interface.
> >
>
> This case is not actually solved more simply by gossiping hostnames.
> It requires much more in-depth understanding of infrastructure
> topology.
>
>
> b
>

Re: Cassandra on AWS across Regions

Posted by Benjamin Black <b...@b3k.us>.
On Wed, Sep 1, 2010 at 4:16 PM, Andres March <am...@qualcomm.com> wrote:
> I didn't have anything specific in mind. I understand all the issues around
> DNS and not advocating only supporting hostnames (just thought it would be a
> nice option).  I also wouldn't expect name resolution to be done all the
> time, only when the node is first being started or during initial discovery.
>

All nodes would have to resolve whenever topology changed.

> One use case might be when nodes are spread out over multiple networks as
> the poster describes, nodes on the same network on a private interface could
> incur less network overhead than if they go out through the public
> interface.  I'm not sure that this is even possible given that cassandra
> binds to only one interface.
>

This case is not actually solved more simply by gossiping hostnames.
It requires much more in-depth understanding of infrastructure
topology.


b

Re: Cassandra on AWS across Regions

Posted by Andres March <am...@qualcomm.com>.
  I didn't have anything specific in mind. I understand all the issues 
around DNS and not advocating only supporting hostnames (just thought it 
would be a nice option).  I also wouldn't expect name resolution to be 
done all the time, only when the node is first being started or during 
initial discovery.

One use case might be when nodes are spread out over multiple networks 
as the poster describes, nodes on the same network on a private 
interface could incur less network overhead than if they go out through 
the public interface.  I'm not sure that this is even possible given 
that cassandra binds to only one interface.


On 09/01/2010 03:23 PM, Benjamin Black wrote:
> On Wed, Sep 1, 2010 at 3:18 PM, Andres March<am...@qualcomm.com>  wrote:
>> I thought you might say that.  Is there some reason to gossip IP addresses
>> vs hostnames?  I thought that layer of indirection could be useful in more
>> than just this use case.
>>
> The trade-off for that flexibility is that nodes are now dependent on
> name resolution during normal operation, rather than only at startup.
> The opportunities for horribly confusing failure scenarios are
> numerous and frightening.  Other than NAT (which can clearly be dealt
> with without gossiping hostnames), what do you think this would
> enable?
>
>
> b

-- 
*Andres March*
amarch@qualcomm.com <ma...@qualcomm.com>
Qualcomm Internet Services

Re: Cassandra on AWS across Regions

Posted by Benjamin Black <b...@b3k.us>.
On Wed, Sep 1, 2010 at 3:18 PM, Andres March <am...@qualcomm.com> wrote:
> I thought you might say that.  Is there some reason to gossip IP addresses
> vs hostnames?  I thought that layer of indirection could be useful in more
> than just this use case.
>

The trade-off for that flexibility is that nodes are now dependent on
name resolution during normal operation, rather than only at startup.
The opportunities for horribly confusing failure scenarios are
numerous and frightening.  Other than NAT (which can clearly be dealt
with without gossiping hostnames), what do you think this would
enable?


b

Re: Cassandra on AWS across Regions

Posted by Andres March <am...@qualcomm.com>.
  I thought you might say that.  Is there some reason to gossip IP 
addresses vs hostnames?  I thought that layer of indirection could be 
useful in more than just this use case.

I still think it is a good idea to have a separate bind vs gossip config 
param.

On 09/01/2010 03:10 PM, Benjamin Black wrote:
> It's not gossiping hostnames, it's gossiping IP addresses.  The
> purpose of Peter's patch is to have the system gossip its external
> address (so other nodes can connect), but bind its internal address.
> As Edward notes, it helps with NAT in general, not just EC2.  Not
> perfect, but a great start.
>
>
> b
>
> On Wed, Sep 1, 2010 at 2:57 PM, Andres March<am...@qualcomm.com>  wrote:
>> Is it not possible to put the external host name in cassandra.yaml and add a
>> host entry in /etc/hosts for that name to resolve to the local interface?
>>
>> On 09/01/2010 01:24 PM, Benjamin Black wrote:
>>
>> The issue is this:
>>
>> The IP address by which an EC2 instance is known _externally_ is not
>> actually on the instance itself (the address being translated), and
>> the _internal_ address is not accessible across regions.  Since you
>> can't bind a specific address that is not on one of your local
>> interfaces, and Cassandra nodes don't have a notion of internal vs
>> external you need a mechanism by which a node is told to bind one IP
>> (the internal one), while it gossips another (the external one).
>>
>> I like what this patch does conceptually, but would prefer
>> configuration options to cause it to happen (obviously a much larger
>> patch).  Very cool, Peter!
>>
>>
>> b
>>
>> On Wed, Sep 1, 2010 at 1:10 PM, Andres March<am...@qualcomm.com>  wrote:
>>
>> Could you explain this point further?  Was there an exception?
>>
>> On 09/01/2010 09:26 AM, Peter Fales wrote:
>>
>> that doesn't quite work with the stock Cassandra, as it will
>> try to bind and listen on those addresses and give up because they
>> don't appear to be valid network addresses.
>>
>> --
>> Andres March
>> amarch@qualcomm.com
>> Qualcomm Internet Services
>>
>> --
>> Andres March
>> amarch@qualcomm.com
>> Qualcomm Internet Services

-- 
*Andres March*
amarch@qualcomm.com <ma...@qualcomm.com>
Qualcomm Internet Services

Re: Cassandra on AWS across Regions

Posted by Benjamin Black <b...@b3k.us>.
It's not gossiping hostnames, it's gossiping IP addresses.  The
purpose of Peter's patch is to have the system gossip its external
address (so other nodes can connect), but bind its internal address.
As Edward notes, it helps with NAT in general, not just EC2.  Not
perfect, but a great start.


b

On Wed, Sep 1, 2010 at 2:57 PM, Andres March <am...@qualcomm.com> wrote:
> Is it not possible to put the external host name in cassandra.yaml and add a
> host entry in /etc/hosts for that name to resolve to the local interface?
>
> On 09/01/2010 01:24 PM, Benjamin Black wrote:
>
> The issue is this:
>
> The IP address by which an EC2 instance is known _externally_ is not
> actually on the instance itself (the address being translated), and
> the _internal_ address is not accessible across regions.  Since you
> can't bind a specific address that is not on one of your local
> interfaces, and Cassandra nodes don't have a notion of internal vs
> external you need a mechanism by which a node is told to bind one IP
> (the internal one), while it gossips another (the external one).
>
> I like what this patch does conceptually, but would prefer
> configuration options to cause it to happen (obviously a much larger
> patch).  Very cool, Peter!
>
>
> b
>
> On Wed, Sep 1, 2010 at 1:10 PM, Andres March <am...@qualcomm.com> wrote:
>
> Could you explain this point further?  Was there an exception?
>
> On 09/01/2010 09:26 AM, Peter Fales wrote:
>
> that doesn't quite work with the stock Cassandra, as it will
> try to bind and listen on those addresses and give up because they
> don't appear to be valid network addresses.
>
> --
> Andres March
> amarch@qualcomm.com
> Qualcomm Internet Services
>
> --
> Andres March
> amarch@qualcomm.com
> Qualcomm Internet Services

Re: Cassandra on AWS across Regions

Posted by Andres March <am...@qualcomm.com>.
  Is it not possible to put the external host name in cassandra.yaml and 
add a host entry in /etc/hosts for that name to resolve to the local 
interface?

On 09/01/2010 01:24 PM, Benjamin Black wrote:
> The issue is this:
>
> The IP address by which an EC2 instance is known _externally_ is not
> actually on the instance itself (the address being translated), and
> the _internal_ address is not accessible across regions.  Since you
> can't bind a specific address that is not on one of your local
> interfaces, and Cassandra nodes don't have a notion of internal vs
> external you need a mechanism by which a node is told to bind one IP
> (the internal one), while it gossips another (the external one).
>
> I like what this patch does conceptually, but would prefer
> configuration options to cause it to happen (obviously a much larger
> patch).  Very cool, Peter!
>
>
> b
>
> On Wed, Sep 1, 2010 at 1:10 PM, Andres March<am...@qualcomm.com>  wrote:
>> Could you explain this point further?  Was there an exception?
>>
>> On 09/01/2010 09:26 AM, Peter Fales wrote:
>>
>> that doesn't quite work with the stock Cassandra, as it will
>> try to bind and listen on those addresses and give up because they
>> don't appear to be valid network addresses.
>>
>> --
>> Andres March
>> amarch@qualcomm.com
>> Qualcomm Internet Services

-- 
*Andres March*
amarch@qualcomm.com <ma...@qualcomm.com>
Qualcomm Internet Services

Re: Cassandra on AWS across Regions

Posted by Joe Stump <jo...@joestump.net>.
On Sep 1, 2010, at 1:42 PM, Peter Fales wrote:

> I probably should have made it clear that I wasn't proposing this as
> an official patch (as you point out, it's not general enough for 
> production use).   I'm just looking for feedback on the concept (thanks!)
> and thought it might possibly be useful to other folks trying to
> do the same thing.

We're extremely interested in this patch and helping out. Let me know if you need resources. SimpleGeo is ready, willing, and able to help as we are close to undertaking a similar endeavor. 

--Joe


Re: Cassandra on AWS across Regions

Posted by Edward Capriolo <ed...@gmail.com>.
On Wed, Sep 1, 2010 at 4:42 PM, Peter Fales
<Pe...@alcatel-lucent.com> wrote:
> I probably should have made it clear that I wasn't proposing this as
> an official patch (as you point out, it's not general enough for
> production use).   I'm just looking for feedback on the concept (thanks!)
> and thought it might possibly be useful to other folks trying to
> do the same thing.
>
>
> On Wed, Sep 01, 2010 at 03:24:44PM -0500, Benjamin Black wrote:
>> The issue is this:
>>
>> The IP address by which an EC2 instance is known _externally_ is not
>> actually on the instance itself (the address being translated), and
>> the _internal_ address is not accessible across regions.  Since you
>> can't bind a specific address that is not on one of your local
>> interfaces, and Cassandra nodes don't have a notion of internal vs
>> external you need a mechanism by which a node is told to bind one IP
>> (the internal one), while it gossips another (the external one).
>>
>> I like what this patch does conceptually, but would prefer
>> configuration options to cause it to happen (obviously a much larger
>> patch).  Very cool, Peter!
>>
>>
>> b
>>
>> On Wed, Sep 1, 2010 at 1:10 PM, Andres March <am...@qualcomm.com> wrote:
>> > Could you explain this point further?  Was there an exception?
>> >
>> > On 09/01/2010 09:26 AM, Peter Fales wrote:
>> >
>> > that doesn't quite work with the stock Cassandra, as it will
>> > try to bind and listen on those addresses and give up because they
>> > don't appear to be valid network addresses.
>> >
>> > --
>> > Andres March
>> > amarch@qualcomm.com
>> > Qualcomm Internet Services
>
> --
> Peter Fales
> Alcatel-Lucent
> Member of Technical Staff
> 1960 Lucent Lane
> Room: 9H-505
> Naperville, IL 60566-7033
> Email: Peter.Fales@alcatel-lucent.com
> Phone: 630 979 8031
>

Even though the performance will be impacted, this essentially is
allowing cassandra to run over Network Address Translated IP. Not a
bad thing.

Re: Cassandra on AWS across Regions

Posted by Peter Fales <Pe...@alcatel-lucent.com>.
I probably should have made it clear that I wasn't proposing this as
an official patch (as you point out, it's not general enough for 
production use).   I'm just looking for feedback on the concept (thanks!)
and thought it might possibly be useful to other folks trying to
do the same thing.


On Wed, Sep 01, 2010 at 03:24:44PM -0500, Benjamin Black wrote:
> The issue is this:
> 
> The IP address by which an EC2 instance is known _externally_ is not
> actually on the instance itself (the address being translated), and
> the _internal_ address is not accessible across regions.  Since you
> can't bind a specific address that is not on one of your local
> interfaces, and Cassandra nodes don't have a notion of internal vs
> external you need a mechanism by which a node is told to bind one IP
> (the internal one), while it gossips another (the external one).
> 
> I like what this patch does conceptually, but would prefer
> configuration options to cause it to happen (obviously a much larger
> patch).  Very cool, Peter!
> 
> 
> b
> 
> On Wed, Sep 1, 2010 at 1:10 PM, Andres March <am...@qualcomm.com> wrote:
> > Could you explain this point further?  Was there an exception?
> >
> > On 09/01/2010 09:26 AM, Peter Fales wrote:
> >
> > that doesn't quite work with the stock Cassandra, as it will
> > try to bind and listen on those addresses and give up because they
> > don't appear to be valid network addresses.
> >
> > --
> > Andres March
> > amarch@qualcomm.com
> > Qualcomm Internet Services

-- 
Peter Fales
Alcatel-Lucent
Member of Technical Staff
1960 Lucent Lane
Room: 9H-505
Naperville, IL 60566-7033
Email: Peter.Fales@alcatel-lucent.com
Phone: 630 979 8031

Re: Cassandra on AWS across Regions

Posted by Jonathan Ellis <jb...@gmail.com>.
+1

On Wed, Sep 1, 2010 at 1:24 PM, Benjamin Black <b...@b3k.us> wrote:
> The issue is this:
>
> The IP address by which an EC2 instance is known _externally_ is not
> actually on the instance itself (the address being translated), and
> the _internal_ address is not accessible across regions.  Since you
> can't bind a specific address that is not on one of your local
> interfaces, and Cassandra nodes don't have a notion of internal vs
> external you need a mechanism by which a node is told to bind one IP
> (the internal one), while it gossips another (the external one).
>
> I like what this patch does conceptually, but would prefer
> configuration options to cause it to happen (obviously a much larger
> patch).  Very cool, Peter!
>
>
> b
>
> On Wed, Sep 1, 2010 at 1:10 PM, Andres March <am...@qualcomm.com> wrote:
>> Could you explain this point further?  Was there an exception?
>>
>> On 09/01/2010 09:26 AM, Peter Fales wrote:
>>
>> that doesn't quite work with the stock Cassandra, as it will
>> try to bind and listen on those addresses and give up because they
>> don't appear to be valid network addresses.
>>
>> --
>> Andres March
>> amarch@qualcomm.com
>> Qualcomm Internet Services
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Cassandra on AWS across Regions

Posted by Benjamin Black <b...@b3k.us>.
The issue is this:

The IP address by which an EC2 instance is known _externally_ is not
actually on the instance itself (the address being translated), and
the _internal_ address is not accessible across regions.  Since you
can't bind a specific address that is not on one of your local
interfaces, and Cassandra nodes don't have a notion of internal vs
external you need a mechanism by which a node is told to bind one IP
(the internal one), while it gossips another (the external one).

I like what this patch does conceptually, but would prefer
configuration options to cause it to happen (obviously a much larger
patch).  Very cool, Peter!


b

On Wed, Sep 1, 2010 at 1:10 PM, Andres March <am...@qualcomm.com> wrote:
> Could you explain this point further?  Was there an exception?
>
> On 09/01/2010 09:26 AM, Peter Fales wrote:
>
> that doesn't quite work with the stock Cassandra, as it will
> try to bind and listen on those addresses and give up because they
> don't appear to be valid network addresses.
>
> --
> Andres March
> amarch@qualcomm.com
> Qualcomm Internet Services

Re: Cassandra on AWS across Regions

Posted by Andres March <am...@qualcomm.com>.
  Could you explain this point further?  Was there an exception?

On 09/01/2010 09:26 AM, Peter Fales wrote:
> that doesn't quite work with the stock Cassandra, as it will
> try to bind and listen on those addresses and give up because they
> don't appear to be valid network addresses.

-- 
*Andres March*
amarch@qualcomm.com <ma...@qualcomm.com>
Qualcomm Internet Services