You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@whirr.apache.org by "Periya.Data" <pe...@gmail.com> on 2011/12/05 05:08:30 UTC

hadoop issues on Ubuntu AMIs

Hi,
    I feel I am not doing something right. I have switched between several
AMIs and still unable to see hadoop running on the EC2 instance and unable
to run the basic "hadoop fs -ls /" command on it. Here is my configs and
outputs:

Andrei suggested that I use 64-bit 10.04 LTS. But, I saw similar issue
there too. Also, I think I was not able to find a 64-bit 10.04 LTS AMI that
runs on m1.small. (I have seen some built for t1.micro and other large
instances).

# 32-bit  10.04 LTS EBS
whirr.image-id=us-east-1/ami-ab36fbc2
whirr.hardware-id=m1.small

whirr.hadoop.install-function=install_cdh_hadoop
whirr.hadoop.configure-function=configure_cdh_hadoop
===========================================

sri@PeriyaData:~$ ssh -i ~/.ssh/id_rsa
ec2-174-129-113-79.compute-1.amazonaws.com
The authenticity of host
'ec2-174-129-113-79.compute-1.amazonaws.com(174.129.113.79)' can't be
established.
RSA key fingerprint is 0b:33:c6:f2:5f:0e:a2:97:8a:75:1c:be:37:2f:c2:85.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added
'ec2-174-129-113-79.compute-1.amazonaws.com,174.129.113.79'
(RSA) to the list of known hosts.
Linux domU-12-31-39-09-9D-E4 2.6.32-318-ec2 #38-Ubuntu SMP Thu Sep 1
17:54:33 UTC 2011 i686 GNU/Linux
Ubuntu 10.04.3 LTS

Welcome to Ubuntu!
 * Documentation:  https://help.ubuntu.com/

  System information as of Mon Dec  5 03:29:47 UTC 2011

  System load:  0.33              Processes:           65
  Usage of /:   13.6% of 7.87GB   Users logged in:     0
  Memory usage: 14%               IP address for eth0: 10.210.162.18
  Swap usage:   0%

  Graph this data and manage this system at https://landscape.canonical.com/
---------------------------------------------------------------------
At the moment, only the core of the system is installed. To tune the
system to your needs, you can choose to install one or more
predefined collections of software by running the following
command:

   sudo tasksel --section server
---------------------------------------------------------------------

Get cloud support with Ubuntu Advantage Cloud Guest
  http://www.ubuntu.com/business/services/cloud
Last login: Mon Dec  5 03:28:47 2011 from
108-90-42-72.lightspeed.sntcca.sbcglobal.net
sri@domU-12-31-39-09-9D-E4:~$
sri@domU-12-31-39-09-9D-E4:~$
sri@domU-12-31-39-09-9D-E4:~$ java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) Client VM (build 20.1-b02, mixed mode, sharing)
sri@domU-12-31-39-09-9D-E4:~$ hadoop version
hadoop: command not found
sri@domU-12-31-39-09-9D-E4:~$
sri@PeriyaData:~$ whirr list-cluster --config ~/pd/hadoop-cfg.properties
us-east-1/i-63615b00    us-east-1/ami-ab36fbc2    174.129.113.79
10.210.162.18    RUNNING    us-east-1a    hadoop-namenode,hadoop-jobtracker
us-east-1/i-e3615b80    us-east-1/ami-ab36fbc2    50.19.22.60
10.214.6.244    RUNNING    us-east-1a    hadoop-datanode,hadoop-tasktracker
us-east-1/i-e1615b82    us-east-1/ami-ab36fbc2    50.19.6.250
10.254.79.245    RUNNING    us-east-1a    hadoop-datanode,hadoop-tasktracker
sri@PeriyaData:~$
sri@PeriyaData:~$
sri@PeriyaData:~$ export HADOOP_CONF_DIR=~/.whirr/HadoopCluster/
sri@PeriyaData:~$
sri@domU-12-31-39-09-9D-E4:~$

*This is in the instance*
sri@domU-12-31-39-09-9D-E4:~$ hadoop fs -ls /
hadoop: command not found
sri@domU-12-31-39-09-9D-E4:~$ hadoop version
hadoop: command not found
sri@domU-12-31-39-09-9D-E4:~$

Now, this is in my local laptop:
sri@PeriyaData:~$ hadoop fs -ls /
11/12/04 19:47:02 WARN conf.Configuration: DEPRECATED: hadoop-site.xml
found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use
core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of
core-default.xml, mapred-default.xml and hdfs-default.xml respectively
11/12/04 19:47:04 INFO ipc.Client: Retrying connect to server:
ec2-174-129-113-79.compute-1.amazonaws.com/174.129.113.79:8020. Already
tried 0 time(s).
11/12/04 19:47:05 INFO ipc.Client: Retrying connect to server:
ec2-174-129-113-79.compute-1.amazonaws.com/174.129.113.79:8020. Already
tried 1 time(s).
11/12/04 19:47:06 INFO ipc.Client: Retrying connect to server:
ec2-174-129-113-79.compute-1.amazonaws.com/174.129.113.79:8020. Already
tried 2 time(s).
11/12/04 19:47:07 INFO ipc.Client: Retrying connect to server:
ec2-174-129-113-79.compute-1.amazonaws.com/174.129.113.79:8020. Already
tried 3 time(s).
11/12/04 19:47:08 INFO ipc.Client: Retrying connect to server:
ec2-174-129-113-79.compute-1.amazonaws.com/174.129.113.79:8020. Already
tried 4 time(s).
11/12/04 19:47:09 INFO ipc.Client: Retrying connect to server:
ec2-174-129-113-79.compute-1.amazonaws.com/174.129.113.79:8020. Already
tried 5 time(s).
11/12/04 19:47:10 INFO ipc.Client: Retrying connect to server:
ec2-174-129-113-79.compute-1.amazonaws.com/174.129.113.79:8020. Already
tried 6 time(s).
11/12/04 19:47:11 INFO ipc.Client: Retrying connect to server:
ec2-174-129-113-79.compute-1.amazonaws.com/174.129.113.79:8020. Already
tried 7 time(s).
11/12/04 19:47:12 INFO ipc.Client: Retrying connect to server:
ec2-174-129-113-79.compute-1.amazonaws.com/174.129.113.79:8020. Already
tried 8 time(s).
11/12/04 19:47:13 INFO ipc.Client: Retrying connect to server:
ec2-174-129-113-79.compute-1.amazonaws.com/174.129.113.79:8020. Already
tried 9 time(s).
*Bad connection to FS. command aborted. exception: Call to
ec2-174-129-113-79.compute-1.amazonaws.com/174.129.113.79:8020 failed on
local exception: java.net.SocketException: Connection refused*
sri@PeriyaData:~$

*This on the instance*
srivathsan@domU-12-31-39-09-9D-E4:/tmp/logs$ more stdout.log
Reading package lists...
hadoop-0.20.2.tar.gz: OK
srivathsan@domU-12-31-39-09-9D-E4:/tmp/logs$
============================================

*Questions:*

   1. Assuming everything is fine, where does Hadoop gets installed on the
   EC2 instance? What is the path?
   2. Even if Hadoop is successfully installed on the EC2 instance, are the
   env variables properly changed on that instance? Like, path must be updated
   either on its .bashrc or .bash_profile ...right?
   3. Am I missing any important step here which is not documented?
   4. The stdout.log file on the instance says "reading package lists..". I
   do not see logs about hadoop getting installed...as I see for Java
   ("setting up sun-java6-jdk" ...). Is there a way to enable verbose logging?
   I am using m1.small hardware. So, I am sure it will have enough space to
   install hadoop and run it.
   5. If you know of any Ubuntu AMI that you have consistently run Hadoop,
   please let me know. I will definitely try that.

I am asking the above questions because I feel I am not looking at the
right place. After switching several AMIs, if I still see the same
behavior, I must be looking at the wrong places.

I am doing something stupid here. Not sure what. I am properly exporting
the hadoop conf dir. The ssh key pairs are good. I do not know why
connection gets refused and do not understand the last line (highlighted in
yellow). Am I missing any important step?

Also, the funny thing is this: I am able to see the dfshealth.jsp page on
my firefox browser (after running the proxy shell script). But, when I
click on the link to show the filesystem, it is unable to display
them...connection to server problem!

Any suggestions/best practices?

Thanks,
PD

Re: hadoop issues on Ubuntu AMIs

Posted by Andrei Savu <sa...@gmail.com>.

See inline.

On Wed, Dec 7, 2011 at 7:14 AM, Periya.Data <pe...@gmail.com> wrote:

> Thanks ! A few observations:
>
>    - After I do export conf dir and execute "hadoop fs -ls /", I see a
>    different dir structure from what I see when I ssh into the machine and
>    execute it as root. See outputs below.
>
> sri@PeriyaData:~$
> sri@PeriyaData:~$ export HADOOP_CONF_DIR=/\$HOME/.whirr/HadoopCluster/
> sri@PeriyaData:~$
> sri@PeriyaData:~$ hadoop fs -ls /
> Found 25 items
> -rw-------   1 root root    4767328 2011-11-02 12:55 /vmlinuz
> drwxr-xr-x   - root root      12288 2011-12-03 10:49 /etc
> dr-xr-xr-x   - root root          0 2011-12-02 03:28 /proc
> drwxrwxrwt   - root root       4096 2011-12-05 18:07 /tmp
> drwxr-xr-x   - root root       4096 2011-04-25 15:50 /srv
> -rw-r--r--   1 root root   13631900 2011-11-01 22:46 /initrd.img.old
> drwx------   - root root       4096 2011-11-23 22:27 /root
> drwxr-xr-x   - root root       4096 2011-04-21 09:50 /mnt
> drwxr-xr-x   - root root       4096 2011-12-02 09:01 /var
> drwxr-xr-x   - root root       4096 2011-10-01 19:14 /cdrom
> -rw-------   1 root root    4766528 2011-10-07 14:03 /vmlinuz.old
> drwxr-xr-x   - root root        780 2011-12-02 16:28 /run
> drwxr-xr-x   - root root       4096 2011-10-23 18:27 /usr
> drwx------   - root root      16384 2011-10-01 19:05 /lost+found
> drwxr-xr-x   - root root       4096 2011-11-22 22:26 /bin
> drwxr-xr-x   - root root       4096 2011-04-25 15:50 /opt
> drwxr-xr-x   - root root       4096 2011-10-01 19:21 /home
> drwxr-xr-x   - root root       4320 2011-12-02 11:29 /dev
> drwxr-xr-x   - root root       4096 2011-03-21 01:26 /selinux
> drwxr-xr-x   - root root       4096 2011-11-22 22:31 /boot
> drwxr-xr-x   - root root          0 2011-12-02 03:28 /sys
> -rw-r--r--   1 root root   13645361 2011-11-22 22:31 /initrd.img
> drwxr-xr-x   - root root       4096 2011-11-22 22:28 /lib
> drwxr-xr-x   - root root       4096 2011-12-03 10:49 /media
> drwxr-xr-x   - root root      12288 2011-11-22 22:29 /sbin
> sri@PeriyaData:~$
> sri@PeriyaData:~$


This is no different from the output you get when running "ls -l /" and
this is happening because Hadoop
is not able to find the config file. Try:

$ export HADOOP_CONF_DIR=~/.whirr/HadoopCluster/

When running "hadoop fs -ls /" you should get the same output as bellow.

Note: make sure the SOCKS proxy is running.

% . ~/.whirr/HadoopCluster/hadoop-proxy.sh


*After SSH-ing into the master node:*
>
> sri@ip-10-90-131-240:~$ sudo su
> root@ip-10-90-131-240:/home/users/sri#
>
> root@ip-10-90-131-240::/home/users/jtv# jps
> 2860 Jps
> 2667 JobTracker
> 2088 NameNode
> root@ip-10-90-131-240::/home/users/jtv# hadoop fs -ls /
> Error: JAVA_HOME is not set.
> root@ip-10-90-131-240::/home/users/jtv#
>
> *After editing (setting java home) in the .bashrc file and sourcing it ,
> i get the expected dir structure:*
>
> root@ip-10-90-131-240:/home/users/sri# hadoop fs -ls /
> Found 3 items
> drwxr-xr-x   - hadoop supergroup          0 2011-12-05 23:09 /hadoop
> drwxrwxrwx   - hadoop supergroup          0 2011-12-05 23:08 /tmp
> drwxrwxrwx   - hadoop supergroup          0 2011-12-06 01:16 /user
> root@ip-10-90-131-240:/home/users/sri#
> root@ip-10-90-131-240:/home/users/sri#
>
> Is the above normal behavior?
>

It looks normal to me. I think you should be able to load data & run MR
jobs as expected. Can you open an issue
so that we can make sure that JAVA_HOME is exported as expected by the
install script?


>
> Thanks,
> PD/
>
>
>
>  *Questions:*
>>>
>>>    1. Assuming everything is fine, where does Hadoop gets installed on
>>>    the EC2 instance? What is the path?
>>>
>>>
>> Run jps as root and you should see the daemons running.
>>
>>>
>>>    1. Even if Hadoop is successfully installed on the EC2 instance, are
>>>    the env variables properly changed on that instance? Like, path must be
>>>    updated either on its .bashrc or .bash_profile ...right?
>>>
>>>
>> Try to run "hadoop fs -ls /" as root.
>>
>>
>

Re: hadoop issues on Ubuntu AMIs

Posted by "Periya.Data" <pe...@gmail.com>.

Thanks ! A few observations:

   - After I do export conf dir and execute "hadoop fs -ls /", I see a
   different dir structure from what I see when I ssh into the machine and
   execute it as root. See outputs below.

sri@PeriyaData:~$
sri@PeriyaData:~$ export HADOOP_CONF_DIR=/\$HOME/.whirr/HadoopCluster/
sri@PeriyaData:~$
sri@PeriyaData:~$ hadoop fs -ls /
Found 25 items
-rw-------   1 root root    4767328 2011-11-02 12:55 /vmlinuz
drwxr-xr-x   - root root      12288 2011-12-03 10:49 /etc
dr-xr-xr-x   - root root          0 2011-12-02 03:28 /proc
drwxrwxrwt   - root root       4096 2011-12-05 18:07 /tmp
drwxr-xr-x   - root root       4096 2011-04-25 15:50 /srv
-rw-r--r--   1 root root   13631900 2011-11-01 22:46 /initrd.img.old
drwx------   - root root       4096 2011-11-23 22:27 /root
drwxr-xr-x   - root root       4096 2011-04-21 09:50 /mnt
drwxr-xr-x   - root root       4096 2011-12-02 09:01 /var
drwxr-xr-x   - root root       4096 2011-10-01 19:14 /cdrom
-rw-------   1 root root    4766528 2011-10-07 14:03 /vmlinuz.old
drwxr-xr-x   - root root        780 2011-12-02 16:28 /run
drwxr-xr-x   - root root       4096 2011-10-23 18:27 /usr
drwx------   - root root      16384 2011-10-01 19:05 /lost+found
drwxr-xr-x   - root root       4096 2011-11-22 22:26 /bin
drwxr-xr-x   - root root       4096 2011-04-25 15:50 /opt
drwxr-xr-x   - root root       4096 2011-10-01 19:21 /home
drwxr-xr-x   - root root       4320 2011-12-02 11:29 /dev
drwxr-xr-x   - root root       4096 2011-03-21 01:26 /selinux
drwxr-xr-x   - root root       4096 2011-11-22 22:31 /boot
drwxr-xr-x   - root root          0 2011-12-02 03:28 /sys
-rw-r--r--   1 root root   13645361 2011-11-22 22:31 /initrd.img
drwxr-xr-x   - root root       4096 2011-11-22 22:28 /lib
drwxr-xr-x   - root root       4096 2011-12-03 10:49 /media
drwxr-xr-x   - root root      12288 2011-11-22 22:29 /sbin
sri@PeriyaData:~$
sri@PeriyaData:~$

*After SSH-ing into the master node:*

sri@ip-10-90-131-240:~$ sudo su
root@ip-10-90-131-240:/home/users/sri#

root@ip-10-90-131-240::/home/users/jtv# jps
2860 Jps
2667 JobTracker
2088 NameNode
root@ip-10-90-131-240::/home/users/jtv# hadoop fs -ls /
Error: JAVA_HOME is not set.
root@ip-10-90-131-240::/home/users/jtv#

*After editing (setting java home) in the .bashrc file and sourcing it , i
get the expected dir structure:*

root@ip-10-90-131-240:/home/users/sri# hadoop fs -ls /
Found 3 items
drwxr-xr-x   - hadoop supergroup          0 2011-12-05 23:09 /hadoop
drwxrwxrwx   - hadoop supergroup          0 2011-12-05 23:08 /tmp
drwxrwxrwx   - hadoop supergroup          0 2011-12-06 01:16 /user
root@ip-10-90-131-240:/home/users/sri#
root@ip-10-90-131-240:/home/users/sri#

Is the above normal behavior?

Thanks,
PD/


*Questions:*
>>
>>    1. Assuming everything is fine, where does Hadoop gets installed on
>>    the EC2 instance? What is the path?
>>
>>
> Run jps as root and you should see the daemons running.
>
>>
>>    1. Even if Hadoop is successfully installed on the EC2 instance, are
>>    the env variables properly changed on that instance? Like, path must be
>>    updated either on its .bashrc or .bash_profile ...right?
>>
>>
> Try to run "hadoop fs -ls /" as root.
>
>

Re: hadoop issues on Ubuntu AMIs

Posted by Andrei Savu <sa...@gmail.com>.

Here you can find a list of Ubuntu AMIs packaged by Canonical:
http://cloud.ubuntu.com/ami/

Try a recipe like this:

whirr.cluster-name=hadoop-asavu

whirr.instance-templates=1 hadoop-namenode+hadoop-jobtracker,1
hadoop-datanode+hadoop-tasktracker

whirr.hadoop.install-function=install_cdh_hadoop
whirr.hadoop.configure-function=configure_cdh_hadoop

whirr.provider=aws-ec2
whirr.identity=${env:AWS_ACCESS_KEY_ID}
whirr.credential=${env:AWS_SECRET_ACCESS_KEY}

If you don't specify and ami ID Whirr will automatically select an Ubuntu
10.04 for you.


>
> *Questions:*
>
>    1. Assuming everything is fine, where does Hadoop gets installed on
>    the EC2 instance? What is the path?
>
>
Run jps as root and you should see the daemons running.

>
>    1. Even if Hadoop is successfully installed on the EC2 instance, are
>    the env variables properly changed on that instance? Like, path must be
>    updated either on its .bashrc or .bash_profile ...right?
>
>
Try to run "hadoop fs -ls /" as root.

>
>    1. Am I missing any important step here which is not documented?
>
> Nope.


>
>    1. The stdout.log file on the instance says "reading package lists..".
>    I do not see logs about hadoop getting installed...as I see for Java
>    ("setting up sun-java6-jdk" ...). Is there a way to enable verbose logging?
>    I am using m1.small hardware. So, I am sure it will have enough space to
>    install hadoop and run it.
>    2. If you know of any Ubuntu AMI that you have consistently run
>    Hadoop, please let me know. I will definitely try that.
>
> I am asking the above questions because I feel I am not looking at the
> right place. After switching several AMIs, if I still see the same
> behavior, I must be looking at the wrong places.
>
> I am doing something stupid here. Not sure what. I am properly exporting
> the hadoop conf dir. The ssh key pairs are good. I do not know why
> connection gets refused and do not understand the last line (highlighted in
> yellow). Am I missing any important step?
>
> Also, the funny thing is this: I am able to see the dfshealth.jsp page on
> my firefox browser (after running the proxy shell script). But, when I
> click on the link to show the filesystem, it is unable to display
> them...connection to server problem!
>

Have you also added the proxy to firefox?


>
> Any suggestions/best practices?
>
> Thanks,
> PD
>
>