You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Andreas Kostyrka <an...@kostyrka.org> on 2008/03/10 16:49:09 UTC

S3/EC2 setup problem: port 9001 unreachable

Hi!

I'm trying to setup a Hadoop 0.16.0 cluster on EC2/S3. (Manually, not
using the Hadoop AMIs)

I've got the S3 based HDFS working, but I'm stumped when I try to get a
test job running:

hadoop@ec2-67-202-58-97:~/hadoop-0.16.0$ time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper /tmp/test.sh -reducer cat -input testlogs/* -output testlogs-output
additionalConfSpec_:null
null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar17969/] [] /tmp/streamjob17970.jar tmpDir=null
08/03/10 14:01:28 INFO mapred.FileInputFormat: Total input paths to process : 152
08/03/10 14:02:58 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local]
08/03/10 14:02:58 INFO streaming.StreamJob: Running job: job_200803101400_0001
08/03/10 14:02:58 INFO streaming.StreamJob: To kill this job, run:
08/03/10 14:02:58 INFO streaming.StreamJob: /home/hadoop/hadoop-0.16.0/bin/../bin/hadoop job  -Dmapred.job.tracker=ec2-67-202-58-97.compute-1.amazonaws.com:9001 -kill job_200803101400_0001
08/03/10 14:02:58 INFO streaming.StreamJob: Tracking URL: http://ip-10-251-75-165.ec2.internal:50030/jobdetails.jsp?jobid=job_200803101400_0001
08/03/10 14:02:59 INFO streaming.StreamJob:  map 0%  reduce 0%

Furthermore, when I try to connect port 9001 on 10.251.75.165 via telnet from the masterhost itself, it connects:
hadoop@ec2-67-202-58-97:~/hadoop-0.16.0$ telnet 10.251.75.165 9001
Trying 10.251.75.165...
Connected to 10.251.75.165.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

When I try to do this from other VMs in my cluster, it just hangs. 
(tcpdump on the masterhost shows no activity for tcp port 9001):

hadoop@ec2-67-202-37-210:~/hadoop-0.16.0$ telnet ip-10-251-75-165.ec2.internal 9001
Trying 10.251.75.165...

hadoop@ec2-67-202-37-210:~/hadoop-0.16.0$ telnet ip-10-251-75-165.ec2.internal 22
Trying 10.251.75.165...
Connected to ip-10-251-75-165.ec2.internal.
Escape character is '^]'.
SSH-2.0-OpenSSH_4.3p2 Debian-9
^]
telnet> quit
Connection closed.

This is also shown when I connect port 50030, which shows 0 nodes ready to process the job.

Furthermore, the slaves show the following messages:
2008-03-10 15:30:11,455 INFO org.apache.hadoop.ipc.RPC: Problem connecting to server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001
2008-03-10 15:31:12,465 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001. Already tried 1 time(s).
2008-03-10 15:32:13,475 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001. Already tried 2 time(s).

Last but not least, here is my site conf:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>

<property>
  <name>fs.default.name</name>
  <value>s3://lookhad</value>
  <description>The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.</description>
</property>

<property>
  <name>fs.s3.awsAccessKeyId</name>
  <value>2DFGTTFSDFDSZU5SDSD7S5202</value>
</property>

<property>
  <name>fs.s3.awsSecretAccessKey</name>
  <value>RUWgsdfsd67SFDfsdflaI9Gjzfsd8789ksd2r1PtG</value>
</property>

<property>
  <name>mapred.job.tracker</name>
  <value>ec2-67-202-58-97.compute-1.amazonaws.com:9001</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  </description>
</property>
</configuration>

The masternode listens not no localhost:
hadoop@ec2-67-202-58-97:~/hadoop-0.16.0$ netstat -an | grep 9001
tcp        0      0 10.251.75.165:9001      0.0.0.0:*               LISTEN     

Any ideas? My conclusions thus are:

1.) First, it's not a general connectivity problem, because I can connect port 22 without any problems.
2.) OTOH, on port 9001, inside the same group, the connectivity seems to be limited.
3.) All AWS docs tell me that VMs in one group have no firewalls in place.

So what is happening here? Any ideas?

Andreas

Re: S3/EC2 setup problem: port 9001 unreachable

Posted by Chris K Wensel <ch...@wensel.net>.

Andreas

Here are some moderately useful notes on using EC2/S3, mostly learned  
leveraging Hadoop. The groups can't see themselves issue is listed  
<grin>.

http://www.manamplified.org/archives/2008/03/notes-on-using-ec2-s3.html

enjoy
ckw

On Mar 10, 2008, at 9:51 AM, Andreas Kostyrka wrote:

> Found it, was security group setup problem ;(
>
> Andreas
>
> Am Montag, den 10.03.2008, 16:49 +0100 schrieb Andreas Kostyrka:
>> Hi!
>>
>> I'm trying to setup a Hadoop 0.16.0 cluster on EC2/S3. (Manually, not
>> using the Hadoop AMIs)
>>
>> I've got the S3 based HDFS working, but I'm stumped when I try to  
>> get a
>> test job running:
>>
>> hadoop@ec2-67-202-58-97:~/hadoop-0.16.0$ time bin/hadoop jar  
>> contrib/streaming/hadoop-0.16.0-streaming.jar -mapper /tmp/test.sh - 
>> reducer cat -input testlogs/* -output testlogs-output
>> additionalConfSpec_:null
>> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
>> packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar17969/] [] /tmp/ 
>> streamjob17970.jar tmpDir=null
>> 08/03/10 14:01:28 INFO mapred.FileInputFormat: Total input paths to  
>> process : 152
>> 08/03/10 14:02:58 INFO streaming.StreamJob: getLocalDirs(): [/tmp/ 
>> hadoop-hadoop/mapred/local]
>> 08/03/10 14:02:58 INFO streaming.StreamJob: Running job:  
>> job_200803101400_0001
>> 08/03/10 14:02:58 INFO streaming.StreamJob: To kill this job, run:
>> 08/03/10 14:02:58 INFO streaming.StreamJob: /home/hadoop/ 
>> hadoop-0.16.0/bin/../bin/hadoop job  - 
>> Dmapred.job.tracker=ec2-67-202-58-97.compute-1.amazonaws.com:9001 - 
>> kill job_200803101400_0001
>> 08/03/10 14:02:58 INFO streaming.StreamJob: Tracking URL: http://ip-10-251-75-165.ec2.internal:50030/jobdetails.jsp?jobid=job_200803101400_0001
>> 08/03/10 14:02:59 INFO streaming.StreamJob:  map 0%  reduce 0%
>>
>> Furthermore, when I try to connect port 9001 on 10.251.75.165 via  
>> telnet from the masterhost itself, it connects:
>> hadoop@ec2-67-202-58-97:~/hadoop-0.16.0$ telnet 10.251.75.165 9001
>> Trying 10.251.75.165...
>> Connected to 10.251.75.165.
>> Escape character is '^]'.
>> ^]
>> telnet> quit
>> Connection closed.
>>
>> When I try to do this from other VMs in my cluster, it just hangs.
>> (tcpdump on the masterhost shows no activity for tcp port 9001):
>>
>> hadoop@ec2-67-202-37-210:~/hadoop-0.16.0$ telnet  
>> ip-10-251-75-165.ec2.internal 9001
>> Trying 10.251.75.165...
>>
>> hadoop@ec2-67-202-37-210:~/hadoop-0.16.0$ telnet  
>> ip-10-251-75-165.ec2.internal 22
>> Trying 10.251.75.165...
>> Connected to ip-10-251-75-165.ec2.internal.
>> Escape character is '^]'.
>> SSH-2.0-OpenSSH_4.3p2 Debian-9
>> ^]
>> telnet> quit
>> Connection closed.
>>
>> This is also shown when I connect port 50030, which shows 0 nodes  
>> ready to process the job.
>>
>> Furthermore, the slaves show the following messages:
>> 2008-03-10 15:30:11,455 INFO org.apache.hadoop.ipc.RPC: Problem  
>> connecting to server: ec2-67-202-58-97.compute-1.amazonaws.com/ 
>> 10.251.75.165:9001
>> 2008-03-10 15:31:12,465 INFO org.apache.hadoop.ipc.Client: Retrying  
>> connect to server: ec2-67-202-58-97.compute-1.amazonaws.com/ 
>> 10.251.75.165:9001. Already tried 1 time(s).
>> 2008-03-10 15:32:13,475 INFO org.apache.hadoop.ipc.Client: Retrying  
>> connect to server: ec2-67-202-58-97.compute-1.amazonaws.com/ 
>> 10.251.75.165:9001. Already tried 2 time(s).
>>
>> Last but not least, here is my site conf:
>> <?xml version="1.0"?>
>> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
>> <configuration>
>>
>> <property>
>>  <name>fs.default.name</name>
>>  <value>s3://lookhad</value>
>>  <description>The name of the default file system.  A URI whose
>>  scheme and authority determine the FileSystem implementation.  The
>>  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>  the FileSystem implementation class.  The uri's authority is used to
>>  determine the host, port, etc. for a filesystem.</description>
>> </property>
>>
>> <property>
>>  <name>fs.s3.awsAccessKeyId</name>
>>  <value>2DFGTTFSDFDSZU5SDSD7S5202</value>
>> </property>
>>
>> <property>
>>  <name>fs.s3.awsSecretAccessKey</name>
>>  <value>RUWgsdfsd67SFDfsdflaI9Gjzfsd8789ksd2r1PtG</value>
>> </property>
>>
>> <property>
>>  <name>mapred.job.tracker</name>
>>  <value>ec2-67-202-58-97.compute-1.amazonaws.com:9001</value>
>>  <description>The host and port that the MapReduce job tracker runs
>>  at.  If "local", then jobs are run in-process as a single map
>>  and reduce task.
>>  </description>
>> </property>
>> </configuration>
>>
>> The masternode listens not no localhost:
>> hadoop@ec2-67-202-58-97:~/hadoop-0.16.0$ netstat -an | grep 9001
>> tcp        0      0 10.251.75.165:9001      0.0.0.0:*                
>> LISTEN
>>
>> Any ideas? My conclusions thus are:
>>
>> 1.) First, it's not a general connectivity problem, because I can  
>> connect port 22 without any problems.
>> 2.) OTOH, on port 9001, inside the same group, the connectivity  
>> seems to be limited.
>> 3.) All AWS docs tell me that VMs in one group have no firewalls in  
>> place.
>>
>> So what is happening here? Any ideas?
>>
>> Andreas

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/

Re: S3/EC2 setup problem: port 9001 unreachable

Posted by Andreas Kostyrka <an...@kostyrka.org>.

Found it, was security group setup problem ;(

Andreas

Am Montag, den 10.03.2008, 16:49 +0100 schrieb Andreas Kostyrka:
> Hi!
> 
> I'm trying to setup a Hadoop 0.16.0 cluster on EC2/S3. (Manually, not
> using the Hadoop AMIs)
> 
> I've got the S3 based HDFS working, but I'm stumped when I try to get a
> test job running:
> 
> hadoop@ec2-67-202-58-97:~/hadoop-0.16.0$ time bin/hadoop jar contrib/streaming/hadoop-0.16.0-streaming.jar -mapper /tmp/test.sh -reducer cat -input testlogs/* -output testlogs-output
> additionalConfSpec_:null
> null=@@@userJobConfProps_.get(stream.shipped.hadoopstreaming
> packageJobJar: [/tmp/hadoop-hadoop/hadoop-unjar17969/] [] /tmp/streamjob17970.jar tmpDir=null
> 08/03/10 14:01:28 INFO mapred.FileInputFormat: Total input paths to process : 152
> 08/03/10 14:02:58 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-hadoop/mapred/local]
> 08/03/10 14:02:58 INFO streaming.StreamJob: Running job: job_200803101400_0001
> 08/03/10 14:02:58 INFO streaming.StreamJob: To kill this job, run:
> 08/03/10 14:02:58 INFO streaming.StreamJob: /home/hadoop/hadoop-0.16.0/bin/../bin/hadoop job  -Dmapred.job.tracker=ec2-67-202-58-97.compute-1.amazonaws.com:9001 -kill job_200803101400_0001
> 08/03/10 14:02:58 INFO streaming.StreamJob: Tracking URL: http://ip-10-251-75-165.ec2.internal:50030/jobdetails.jsp?jobid=job_200803101400_0001
> 08/03/10 14:02:59 INFO streaming.StreamJob:  map 0%  reduce 0%
> 
> Furthermore, when I try to connect port 9001 on 10.251.75.165 via telnet from the masterhost itself, it connects:
> hadoop@ec2-67-202-58-97:~/hadoop-0.16.0$ telnet 10.251.75.165 9001
> Trying 10.251.75.165...
> Connected to 10.251.75.165.
> Escape character is '^]'.
> ^]
> telnet> quit
> Connection closed.
> 
> When I try to do this from other VMs in my cluster, it just hangs. 
> (tcpdump on the masterhost shows no activity for tcp port 9001):
> 
> hadoop@ec2-67-202-37-210:~/hadoop-0.16.0$ telnet ip-10-251-75-165.ec2.internal 9001
> Trying 10.251.75.165...
> 
> hadoop@ec2-67-202-37-210:~/hadoop-0.16.0$ telnet ip-10-251-75-165.ec2.internal 22
> Trying 10.251.75.165...
> Connected to ip-10-251-75-165.ec2.internal.
> Escape character is '^]'.
> SSH-2.0-OpenSSH_4.3p2 Debian-9
> ^]
> telnet> quit
> Connection closed.
> 
> This is also shown when I connect port 50030, which shows 0 nodes ready to process the job.
> 
> Furthermore, the slaves show the following messages:
> 2008-03-10 15:30:11,455 INFO org.apache.hadoop.ipc.RPC: Problem connecting to server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001
> 2008-03-10 15:31:12,465 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001. Already tried 1 time(s).
> 2008-03-10 15:32:13,475 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: ec2-67-202-58-97.compute-1.amazonaws.com/10.251.75.165:9001. Already tried 2 time(s).
> 
> Last but not least, here is my site conf:
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> <configuration>
> 
> <property>
>   <name>fs.default.name</name>
>   <value>s3://lookhad</value>
>   <description>The name of the default file system.  A URI whose
>   scheme and authority determine the FileSystem implementation.  The
>   uri's scheme determines the config property (fs.SCHEME.impl) naming
>   the FileSystem implementation class.  The uri's authority is used to
>   determine the host, port, etc. for a filesystem.</description>
> </property>
> 
> <property>
>   <name>fs.s3.awsAccessKeyId</name>
>   <value>2DFGTTFSDFDSZU5SDSD7S5202</value>
> </property>
> 
> <property>
>   <name>fs.s3.awsSecretAccessKey</name>
>   <value>RUWgsdfsd67SFDfsdflaI9Gjzfsd8789ksd2r1PtG</value>
> </property>
> 
> <property>
>   <name>mapred.job.tracker</name>
>   <value>ec2-67-202-58-97.compute-1.amazonaws.com:9001</value>
>   <description>The host and port that the MapReduce job tracker runs
>   at.  If "local", then jobs are run in-process as a single map
>   and reduce task.
>   </description>
> </property>
> </configuration>
> 
> The masternode listens not no localhost:
> hadoop@ec2-67-202-58-97:~/hadoop-0.16.0$ netstat -an | grep 9001
> tcp        0      0 10.251.75.165:9001      0.0.0.0:*               LISTEN     
> 
> Any ideas? My conclusions thus are:
> 
> 1.) First, it's not a general connectivity problem, because I can connect port 22 without any problems.
> 2.) OTOH, on port 9001, inside the same group, the connectivity seems to be limited.
> 3.) All AWS docs tell me that VMs in one group have no firewalls in place.
> 
> So what is happening here? Any ideas?
> 
> Andreas