You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Scott Fines <Sc...@nisc.coop> on 2011/10/10 18:47:14 UTC

MapReduce with two ethernet cards

Hi all,

This may be a silly question, but I'm at a bit of a loss, and was hoping for some help.

I have a Cassandra cluster set up with two NICs--one for internel communication between cassandra machines (10.1.1.*), and one to respond to Thrift RPC (172.28.*.*).

I also have a Hadoop cluster set up, which, for unrelated reasons, has to remain separate from Cassandra, so I've written a little MapReduce job to copy data from Cassandra to Hadoop. However, when I try to run my job, I get

java.io.IOException: failed connecting to all endpoints 10.1.1.24,10.1.1.17,10.1.1.16

which is puzzling to me. It seems like the MR is attempting to connect to the internal communication IPs instead of the external Thrift IPs. Since I set up a firewall to block external access to the internal IPs of Cassandra, this is obviously going to fail.

So my question is: why does Cassandra MR seem to be grabbing the listen_address instead of the Thrift one. Presuming it's not a funky configuration error or something on my part, is that strictly necessary? All told, I'd prefer if it was connecting to the Thrift IPs, but if it can't, should I open up port 7000 or port 9160 between Hadoop and Cassandra?

Thanks for your help,

Scott



RE: MapReduce with two ethernet cards

Posted by Scott Fines <Sc...@nisc.coop>.
Looks like that did it, thanks!

Scott
________________________________________
From: Brandon Williams [driftx@gmail.com]
Sent: Thursday, October 13, 2011 2:16 PM
To: user@cassandra.apache.org
Subject: Re: MapReduce with two ethernet cards

On Thu, Oct 13, 2011 at 1:17 PM, Scott Fines <Sc...@nisc.coop> wrote:
> When I look at the source for ColumnFamilyInputFormat, it appears that it does a call to client.describe_ring; when you do the equivalent call  with nodetool, you get the 10.1.1.* addresses.  This seems to indicate to me that I should open up the firewall and attempt to contact those IPs instead of the normal thrift IPs.
>
> That leads me to think that I need to have thrift listening on both IPs, though. Would that then be the case?

My mistake, I thought I'd committed this:
https://issues.apache.org/jira/browse/CASSANDRA-3214

Can you see if that solves your issue?

-Brandon

Re: MapReduce with two ethernet cards

Posted by Brandon Williams <dr...@gmail.com>.
On Thu, Oct 13, 2011 at 1:17 PM, Scott Fines <Sc...@nisc.coop> wrote:
> When I look at the source for ColumnFamilyInputFormat, it appears that it does a call to client.describe_ring; when you do the equivalent call  with nodetool, you get the 10.1.1.* addresses.  This seems to indicate to me that I should open up the firewall and attempt to contact those IPs instead of the normal thrift IPs.
>
> That leads me to think that I need to have thrift listening on both IPs, though. Would that then be the case?

My mistake, I thought I'd committed this:
https://issues.apache.org/jira/browse/CASSANDRA-3214

Can you see if that solves your issue?

-Brandon

RE: MapReduce with two ethernet cards

Posted by Scott Fines <Sc...@nisc.coop>.
When I look at the source for ColumnFamilyInputFormat, it appears that it does a call to client.describe_ring; when you do the equivalent call  with nodetool, you get the 10.1.1.* addresses.  This seems to indicate to me that I should open up the firewall and attempt to contact those IPs instead of the normal thrift IPs. 

That leads me to think that I need to have thrift listening on both IPs, though. Would that then be the case?

Scott
________________________________________
From: Scott Fines [Scott.Fines@nisc.coop]
Sent: Thursday, October 13, 2011 12:40 PM
To: user@cassandra.apache.org
Subject: RE: MapReduce with two ethernet cards

The listen address on all machines are set to the 10.1.1.* addresses, while the thrift rpc address is the 172.28.* addresses

________________________________________
From: Brandon Williams [driftx@gmail.com]
Sent: Thursday, October 13, 2011 12:28 PM
To: user@cassandra.apache.org
Subject: Re: MapReduce with two ethernet cards

What is your rpc_address set to?  If it's 0.0.0.0 (bind everything)
then that's not going to work if listen_address is blocked.

-Brandon

On Thu, Oct 13, 2011 at 11:13 AM, Scott Fines <Sc...@nisc.coop> wrote:
> I upgraded to cassandra 0.8.7, and the problem persists.
>
> Scott
> ________________________________________
> From: Brandon Williams [driftx@gmail.com]
> Sent: Monday, October 10, 2011 12:28 PM
> To: user@cassandra.apache.org
> Subject: Re: MapReduce with two ethernet cards
>
> On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines <Sc...@nisc.coop> wrote:
>> Hi all,
>> This may be a silly question, but I'm at a bit of a loss, and was hoping for
>> some help.
>> I have a Cassandra cluster set up with two NICs--one for internel
>> communication between cassandra machines (10.1.1.*), and one to respond to
>> Thrift RPC (172.28.*.*).
>> I also have a Hadoop cluster set up, which, for unrelated reasons, has to
>> remain separate from Cassandra, so I've written a little MapReduce job to
>> copy data from Cassandra to Hadoop. However, when I try to run my job, I
>> get
>> java.io.IOException: failed connecting to all endpoints
>> 10.1.1.24,10.1.1.17,10.1.1.16
>> which is puzzling to me. It seems like the MR is attempting to connect to
>> the internal communication IPs instead of the external Thrift IPs. Since I
>> set up a firewall to block external access to the internal IPs of Cassandra,
>> this is obviously going to fail.
>> So my question is: why does Cassandra MR seem to be grabbing the
>> listen_address instead of the Thrift one. Presuming it's not a funky
>> configuration error or something on my part, is that strictly necessary? All
>> told, I'd prefer if it was connecting to the Thrift IPs, but if it can't,
>> should I open up port 7000 or port 9160 between Hadoop and Cassandra?
>> Thanks for your help,
>> Scott
>
> Your cassandra is old, upgrade to the latest version.
>
> -Brandon
>

RE: MapReduce with two ethernet cards

Posted by Scott Fines <Sc...@nisc.coop>.
The listen address on all machines are set to the 10.1.1.* addresses, while the thrift rpc address is the 172.28.* addresses

________________________________________
From: Brandon Williams [driftx@gmail.com]
Sent: Thursday, October 13, 2011 12:28 PM
To: user@cassandra.apache.org
Subject: Re: MapReduce with two ethernet cards

What is your rpc_address set to?  If it's 0.0.0.0 (bind everything)
then that's not going to work if listen_address is blocked.

-Brandon

On Thu, Oct 13, 2011 at 11:13 AM, Scott Fines <Sc...@nisc.coop> wrote:
> I upgraded to cassandra 0.8.7, and the problem persists.
>
> Scott
> ________________________________________
> From: Brandon Williams [driftx@gmail.com]
> Sent: Monday, October 10, 2011 12:28 PM
> To: user@cassandra.apache.org
> Subject: Re: MapReduce with two ethernet cards
>
> On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines <Sc...@nisc.coop> wrote:
>> Hi all,
>> This may be a silly question, but I'm at a bit of a loss, and was hoping for
>> some help.
>> I have a Cassandra cluster set up with two NICs--one for internel
>> communication between cassandra machines (10.1.1.*), and one to respond to
>> Thrift RPC (172.28.*.*).
>> I also have a Hadoop cluster set up, which, for unrelated reasons, has to
>> remain separate from Cassandra, so I've written a little MapReduce job to
>> copy data from Cassandra to Hadoop. However, when I try to run my job, I
>> get
>> java.io.IOException: failed connecting to all endpoints
>> 10.1.1.24,10.1.1.17,10.1.1.16
>> which is puzzling to me. It seems like the MR is attempting to connect to
>> the internal communication IPs instead of the external Thrift IPs. Since I
>> set up a firewall to block external access to the internal IPs of Cassandra,
>> this is obviously going to fail.
>> So my question is: why does Cassandra MR seem to be grabbing the
>> listen_address instead of the Thrift one. Presuming it's not a funky
>> configuration error or something on my part, is that strictly necessary? All
>> told, I'd prefer if it was connecting to the Thrift IPs, but if it can't,
>> should I open up port 7000 or port 9160 between Hadoop and Cassandra?
>> Thanks for your help,
>> Scott
>
> Your cassandra is old, upgrade to the latest version.
>
> -Brandon
>

Re: MapReduce with two ethernet cards

Posted by Brandon Williams <dr...@gmail.com>.
What is your rpc_address set to?  If it's 0.0.0.0 (bind everything)
then that's not going to work if listen_address is blocked.

-Brandon

On Thu, Oct 13, 2011 at 11:13 AM, Scott Fines <Sc...@nisc.coop> wrote:
> I upgraded to cassandra 0.8.7, and the problem persists.
>
> Scott
> ________________________________________
> From: Brandon Williams [driftx@gmail.com]
> Sent: Monday, October 10, 2011 12:28 PM
> To: user@cassandra.apache.org
> Subject: Re: MapReduce with two ethernet cards
>
> On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines <Sc...@nisc.coop> wrote:
>> Hi all,
>> This may be a silly question, but I'm at a bit of a loss, and was hoping for
>> some help.
>> I have a Cassandra cluster set up with two NICs--one for internel
>> communication between cassandra machines (10.1.1.*), and one to respond to
>> Thrift RPC (172.28.*.*).
>> I also have a Hadoop cluster set up, which, for unrelated reasons, has to
>> remain separate from Cassandra, so I've written a little MapReduce job to
>> copy data from Cassandra to Hadoop. However, when I try to run my job, I
>> get
>> java.io.IOException: failed connecting to all endpoints
>> 10.1.1.24,10.1.1.17,10.1.1.16
>> which is puzzling to me. It seems like the MR is attempting to connect to
>> the internal communication IPs instead of the external Thrift IPs. Since I
>> set up a firewall to block external access to the internal IPs of Cassandra,
>> this is obviously going to fail.
>> So my question is: why does Cassandra MR seem to be grabbing the
>> listen_address instead of the Thrift one. Presuming it's not a funky
>> configuration error or something on my part, is that strictly necessary? All
>> told, I'd prefer if it was connecting to the Thrift IPs, but if it can't,
>> should I open up port 7000 or port 9160 between Hadoop and Cassandra?
>> Thanks for your help,
>> Scott
>
> Your cassandra is old, upgrade to the latest version.
>
> -Brandon
>

RE: MapReduce with two ethernet cards

Posted by Scott Fines <Sc...@nisc.coop>.
I upgraded to cassandra 0.8.7, and the problem persists.

Scott
________________________________________
From: Brandon Williams [driftx@gmail.com]
Sent: Monday, October 10, 2011 12:28 PM
To: user@cassandra.apache.org
Subject: Re: MapReduce with two ethernet cards

On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines <Sc...@nisc.coop> wrote:
> Hi all,
> This may be a silly question, but I'm at a bit of a loss, and was hoping for
> some help.
> I have a Cassandra cluster set up with two NICs--one for internel
> communication between cassandra machines (10.1.1.*), and one to respond to
> Thrift RPC (172.28.*.*).
> I also have a Hadoop cluster set up, which, for unrelated reasons, has to
> remain separate from Cassandra, so I've written a little MapReduce job to
> copy data from Cassandra to Hadoop. However, when I try to run my job, I
> get
> java.io.IOException: failed connecting to all endpoints
> 10.1.1.24,10.1.1.17,10.1.1.16
> which is puzzling to me. It seems like the MR is attempting to connect to
> the internal communication IPs instead of the external Thrift IPs. Since I
> set up a firewall to block external access to the internal IPs of Cassandra,
> this is obviously going to fail.
> So my question is: why does Cassandra MR seem to be grabbing the
> listen_address instead of the Thrift one. Presuming it's not a funky
> configuration error or something on my part, is that strictly necessary? All
> told, I'd prefer if it was connecting to the Thrift IPs, but if it can't,
> should I open up port 7000 or port 9160 between Hadoop and Cassandra?
> Thanks for your help,
> Scott

Your cassandra is old, upgrade to the latest version.

-Brandon

Re: MapReduce with two ethernet cards

Posted by Brandon Williams <dr...@gmail.com>.
On Mon, Oct 10, 2011 at 11:47 AM, Scott Fines <Sc...@nisc.coop> wrote:
> Hi all,
> This may be a silly question, but I'm at a bit of a loss, and was hoping for
> some help.
> I have a Cassandra cluster set up with two NICs--one for internel
> communication between cassandra machines (10.1.1.*), and one to respond to
> Thrift RPC (172.28.*.*).
> I also have a Hadoop cluster set up, which, for unrelated reasons, has to
> remain separate from Cassandra, so I've written a little MapReduce job to
> copy data from Cassandra to Hadoop. However, when I try to run my job, I
> get
> java.io.IOException: failed connecting to all endpoints
> 10.1.1.24,10.1.1.17,10.1.1.16
> which is puzzling to me. It seems like the MR is attempting to connect to
> the internal communication IPs instead of the external Thrift IPs. Since I
> set up a firewall to block external access to the internal IPs of Cassandra,
> this is obviously going to fail.
> So my question is: why does Cassandra MR seem to be grabbing the
> listen_address instead of the Thrift one. Presuming it's not a funky
> configuration error or something on my part, is that strictly necessary? All
> told, I'd prefer if it was connecting to the Thrift IPs, but if it can't,
> should I open up port 7000 or port 9160 between Hadoop and Cassandra?
> Thanks for your help,
> Scott

Your cassandra is old, upgrade to the latest version.

-Brandon