You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Andrey Pankov <ap...@iponweb.net> on 2008/03/18 14:17:57 UTC

Issue with cluster over EC2 and different AMI types

Hi all,

I'm trying to configure Hadoop cluster over Amazon EC2, one m1.small 
instance for master node, and some m1.large instances for slaves. Both 
master's on slaves's AMIs have the same version of Hadoop, 0.16.0.

I run ec2 instances using ec2-run-instances, with the same --group 
parameter, but in two step, one call - run for master, second call - run 
for slaves.

It looks like EC2 instances with different AMI types starting in 
different networks, for example external and internal DNS names:

   * ec2-67-202-59-12.compute-1.amazonaws.com
     ip-10-251-74-181.ec2.internal - for small instance
   * ec2-67-202-3-191.compute-1.amazonaws.com
     domU-12-31-38-00-5C-C1.compute-1.internal - for large

The trouble is that slaves could not contact the master. When I specify 
fs.default.name parameter in hadoop-site.xml on slaves box to be full 
DNS name of master (either external or internal) and try to start 
datanode on it (bin/hadoop-daemon.sh ... start datanode), Hadoop 
replaces fs.default.name with just 'ip-10-251-74-181' and puts in log

2008-03-18 07:08:16,028 ERROR org.apache.hadoop.dfs.DataNode: 
java.net.UnknownHostException: unknown host: ip-10-251-74-181
...

So DataNode could not be started.

I tried to specify IP addr of ip-10-251-74-181 in /etc/hosts for each 
slave instance and it helped to start DataNode on slaves. And it became 
possible to store smth in HDFS. But. When I'm trying to run map-reduce 
job (in jar file), it doesn't work. I mean that jobs is still working 
but there is no any progress at all. Hadoop have written Map 0% Reduce 
0% and just freeze.

Can not not find anything in logs what could help a bit, both on master 
and on slave boxes.

I found that dfs.network.script could be used to specify somehow a 
network location for a machine, but have no ideas now to use it. Can 
racks help me with it?

Thanks in advance.

---
Andrey Pankov

Re: Issue with cluster over EC2 and different AMI types

Posted by Tom White <to...@gmail.com>.

Unfortunately there is no way to discover the rack that EC2 instances
are running on so you won't be able to use this optimization.

Tom

On 18/03/2008, Andrey Pankov <ap...@iponweb.net> wrote:
> Hi,
>
>  I'm apologize. It was my fault - I forgot to run tasktracker on slaves.
>  But anyway can anyone share his experience how to use rack?
>  Thanks.
>
>
>  Andrey Pankov wrote:
>  > Hi all,
>  >
>  > I'm trying to configure Hadoop cluster over Amazon EC2, one m1.small
>  > instance for master node, and some m1.large instances for slaves. Both
>  > master's on slaves's AMIs have the same version of Hadoop, 0.16.0.
>  >
>  > I run ec2 instances using ec2-run-instances, with the same --group
>  > parameter, but in two step, one call - run for master, second call - run
>  > for slaves.
>  >
>  > It looks like EC2 instances with different AMI types starting in
>  > different networks, for example external and internal DNS names:
>  >
>  >   * ec2-67-202-59-12.compute-1.amazonaws.com
>  >     ip-10-251-74-181.ec2.internal - for small instance
>  >   * ec2-67-202-3-191.compute-1.amazonaws.com
>  >     domU-12-31-38-00-5C-C1.compute-1.internal - for large
>  >
>  > The trouble is that slaves could not contact the master. When I specify
>  > fs.default.name parameter in hadoop-site.xml on slaves box to be full
>  > DNS name of master (either external or internal) and try to start
>  > datanode on it (bin/hadoop-daemon.sh ... start datanode), Hadoop
>  > replaces fs.default.name with just 'ip-10-251-74-181' and puts in log
>  >
>  > 2008-03-18 07:08:16,028 ERROR org.apache.hadoop.dfs.DataNode:
>  > java.net.UnknownHostException: unknown host: ip-10-251-74-181
>  > ...
>  >
>  > So DataNode could not be started.
>  >
>  > I tried to specify IP addr of ip-10-251-74-181 in /etc/hosts for each
>  > slave instance and it helped to start DataNode on slaves. And it became
>  > possible to store smth in HDFS. But. When I'm trying to run map-reduce
>  > job (in jar file), it doesn't work. I mean that jobs is still working
>  > but there is no any progress at all. Hadoop have written Map 0% Reduce
>  > 0% and just freeze.
>  >
>  > Can not not find anything in logs what could help a bit, both on master
>  > and on slave boxes.
>  >
>  > I found that dfs.network.script could be used to specify somehow a
>  > network location for a machine, but have no ideas now to use it. Can
>  > racks help me with it?
>  >
>  > Thanks in advance.
>  >
>  > ---
>  > Andrey Pankov
>  >
>  >
>
>
> ---
>
> Andrey Pankov
>


-- 
Blog: http://www.lexemetech.com/

Re: Issue with cluster over EC2 and different AMI types

Posted by Andrey Pankov <ap...@iponweb.net>.

Hi,

I'm apologize. It was my fault - I forgot to run tasktracker on slaves.
But anyway can anyone share his experience how to use rack?
Thanks.

Andrey Pankov wrote:
> Hi all,
> 
> I'm trying to configure Hadoop cluster over Amazon EC2, one m1.small 
> instance for master node, and some m1.large instances for slaves. Both 
> master's on slaves's AMIs have the same version of Hadoop, 0.16.0.
> 
> I run ec2 instances using ec2-run-instances, with the same --group 
> parameter, but in two step, one call - run for master, second call - run 
> for slaves.
> 
> It looks like EC2 instances with different AMI types starting in 
> different networks, for example external and internal DNS names:
> 
>   * ec2-67-202-59-12.compute-1.amazonaws.com
>     ip-10-251-74-181.ec2.internal - for small instance
>   * ec2-67-202-3-191.compute-1.amazonaws.com
>     domU-12-31-38-00-5C-C1.compute-1.internal - for large
> 
> The trouble is that slaves could not contact the master. When I specify 
> fs.default.name parameter in hadoop-site.xml on slaves box to be full 
> DNS name of master (either external or internal) and try to start 
> datanode on it (bin/hadoop-daemon.sh ... start datanode), Hadoop 
> replaces fs.default.name with just 'ip-10-251-74-181' and puts in log
> 
> 2008-03-18 07:08:16,028 ERROR org.apache.hadoop.dfs.DataNode: 
> java.net.UnknownHostException: unknown host: ip-10-251-74-181
> ...
> 
> So DataNode could not be started.
> 
> I tried to specify IP addr of ip-10-251-74-181 in /etc/hosts for each 
> slave instance and it helped to start DataNode on slaves. And it became 
> possible to store smth in HDFS. But. When I'm trying to run map-reduce 
> job (in jar file), it doesn't work. I mean that jobs is still working 
> but there is no any progress at all. Hadoop have written Map 0% Reduce 
> 0% and just freeze.
> 
> Can not not find anything in logs what could help a bit, both on master 
> and on slave boxes.
> 
> I found that dfs.network.script could be used to specify somehow a 
> network location for a machine, but have no ideas now to use it. Can 
> racks help me with it?
> 
> Thanks in advance.
> 
> ---
> Andrey Pankov
> 
> 

---
Andrey Pankov