You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Andrey Pankov <ap...@iponweb.net> on 2008/03/18 14:17:57 UTC
Issue with cluster over EC2 and different AMI types
Hi all,
I'm trying to configure Hadoop cluster over Amazon EC2, one m1.small
instance for master node, and some m1.large instances for slaves. Both
master's on slaves's AMIs have the same version of Hadoop, 0.16.0.
I run ec2 instances using ec2-run-instances, with the same --group
parameter, but in two step, one call - run for master, second call - run
for slaves.
It looks like EC2 instances with different AMI types starting in
different networks, for example external and internal DNS names:
* ec2-67-202-59-12.compute-1.amazonaws.com
ip-10-251-74-181.ec2.internal - for small instance
* ec2-67-202-3-191.compute-1.amazonaws.com
domU-12-31-38-00-5C-C1.compute-1.internal - for large
The trouble is that slaves could not contact the master. When I specify
fs.default.name parameter in hadoop-site.xml on slaves box to be full
DNS name of master (either external or internal) and try to start
datanode on it (bin/hadoop-daemon.sh ... start datanode), Hadoop
replaces fs.default.name with just 'ip-10-251-74-181' and puts in log
2008-03-18 07:08:16,028 ERROR org.apache.hadoop.dfs.DataNode:
java.net.UnknownHostException: unknown host: ip-10-251-74-181
...
So DataNode could not be started.
I tried to specify IP addr of ip-10-251-74-181 in /etc/hosts for each
slave instance and it helped to start DataNode on slaves. And it became
possible to store smth in HDFS. But. When I'm trying to run map-reduce
job (in jar file), it doesn't work. I mean that jobs is still working
but there is no any progress at all. Hadoop have written Map 0% Reduce
0% and just freeze.
Can not not find anything in logs what could help a bit, both on master
and on slave boxes.
I found that dfs.network.script could be used to specify somehow a
network location for a machine, but have no ideas now to use it. Can
racks help me with it?
Thanks in advance.
---
Andrey Pankov
Re: Issue with cluster over EC2 and different AMI types
Posted by Tom White <to...@gmail.com>.
Unfortunately there is no way to discover the rack that EC2 instances
are running on so you won't be able to use this optimization.
Tom
On 18/03/2008, Andrey Pankov <ap...@iponweb.net> wrote:
> Hi,
>
> I'm apologize. It was my fault - I forgot to run tasktracker on slaves.
> But anyway can anyone share his experience how to use rack?
> Thanks.
>
>
> Andrey Pankov wrote:
> > Hi all,
> >
> > I'm trying to configure Hadoop cluster over Amazon EC2, one m1.small
> > instance for master node, and some m1.large instances for slaves. Both
> > master's on slaves's AMIs have the same version of Hadoop, 0.16.0.
> >
> > I run ec2 instances using ec2-run-instances, with the same --group
> > parameter, but in two step, one call - run for master, second call - run
> > for slaves.
> >
> > It looks like EC2 instances with different AMI types starting in
> > different networks, for example external and internal DNS names:
> >
> > * ec2-67-202-59-12.compute-1.amazonaws.com
> > ip-10-251-74-181.ec2.internal - for small instance
> > * ec2-67-202-3-191.compute-1.amazonaws.com
> > domU-12-31-38-00-5C-C1.compute-1.internal - for large
> >
> > The trouble is that slaves could not contact the master. When I specify
> > fs.default.name parameter in hadoop-site.xml on slaves box to be full
> > DNS name of master (either external or internal) and try to start
> > datanode on it (bin/hadoop-daemon.sh ... start datanode), Hadoop
> > replaces fs.default.name with just 'ip-10-251-74-181' and puts in log
> >
> > 2008-03-18 07:08:16,028 ERROR org.apache.hadoop.dfs.DataNode:
> > java.net.UnknownHostException: unknown host: ip-10-251-74-181
> > ...
> >
> > So DataNode could not be started.
> >
> > I tried to specify IP addr of ip-10-251-74-181 in /etc/hosts for each
> > slave instance and it helped to start DataNode on slaves. And it became
> > possible to store smth in HDFS. But. When I'm trying to run map-reduce
> > job (in jar file), it doesn't work. I mean that jobs is still working
> > but there is no any progress at all. Hadoop have written Map 0% Reduce
> > 0% and just freeze.
> >
> > Can not not find anything in logs what could help a bit, both on master
> > and on slave boxes.
> >
> > I found that dfs.network.script could be used to specify somehow a
> > network location for a machine, but have no ideas now to use it. Can
> > racks help me with it?
> >
> > Thanks in advance.
> >
> > ---
> > Andrey Pankov
> >
> >
>
>
> ---
>
> Andrey Pankov
>
--
Blog: http://www.lexemetech.com/
Re: Issue with cluster over EC2 and different AMI types
Posted by Andrey Pankov <ap...@iponweb.net>.
Hi,
I'm apologize. It was my fault - I forgot to run tasktracker on slaves.
But anyway can anyone share his experience how to use rack?
Thanks.
Andrey Pankov wrote:
> Hi all,
>
> I'm trying to configure Hadoop cluster over Amazon EC2, one m1.small
> instance for master node, and some m1.large instances for slaves. Both
> master's on slaves's AMIs have the same version of Hadoop, 0.16.0.
>
> I run ec2 instances using ec2-run-instances, with the same --group
> parameter, but in two step, one call - run for master, second call - run
> for slaves.
>
> It looks like EC2 instances with different AMI types starting in
> different networks, for example external and internal DNS names:
>
> * ec2-67-202-59-12.compute-1.amazonaws.com
> ip-10-251-74-181.ec2.internal - for small instance
> * ec2-67-202-3-191.compute-1.amazonaws.com
> domU-12-31-38-00-5C-C1.compute-1.internal - for large
>
> The trouble is that slaves could not contact the master. When I specify
> fs.default.name parameter in hadoop-site.xml on slaves box to be full
> DNS name of master (either external or internal) and try to start
> datanode on it (bin/hadoop-daemon.sh ... start datanode), Hadoop
> replaces fs.default.name with just 'ip-10-251-74-181' and puts in log
>
> 2008-03-18 07:08:16,028 ERROR org.apache.hadoop.dfs.DataNode:
> java.net.UnknownHostException: unknown host: ip-10-251-74-181
> ...
>
> So DataNode could not be started.
>
> I tried to specify IP addr of ip-10-251-74-181 in /etc/hosts for each
> slave instance and it helped to start DataNode on slaves. And it became
> possible to store smth in HDFS. But. When I'm trying to run map-reduce
> job (in jar file), it doesn't work. I mean that jobs is still working
> but there is no any progress at all. Hadoop have written Map 0% Reduce
> 0% and just freeze.
>
> Can not not find anything in logs what could help a bit, both on master
> and on slave boxes.
>
> I found that dfs.network.script could be used to specify somehow a
> network location for a machine, but have no ideas now to use it. Can
> racks help me with it?
>
> Thanks in advance.
>
> ---
> Andrey Pankov
>
>
---
Andrey Pankov