You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@whirr.apache.org by pr...@nokia.com on 2011/01/10 19:27:56 UTC

Dynamic creation and destroying hadoop on Rackspace

Hello all,
W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud. The jobs run for a total of 3 to 5 hours a day. Currently I have manually installed and configrued Hadoop on Rackspace which is a laborious process (especially given that we have about 10 environments that we need to configure). So my question is about automatic creation and desrtoying of Hadoop cluster using a program (preferably Java). Here is my current deployment.

Glassfish (Node 1)
Mysql (Node 2)
Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)

We can install Glassfish and MySql manually but we would like to dynamically create/install hadoop cluster, start the servers, run jobs and then destroy cluster on the cloud. Primary purpose of doing this is to make deployment easy and save costs. Since the jobs are run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.

Jeff Hammerbacher from Cloudera had suggested I look at Whirr and he was positive that I can do the above steps using Whirr. Has anyone done this using Whirr on Rackspace. I could not find any examples on how to dynamically install Hadoop cluster on Rackspace. Any information on this task would be greatly appreciated.

Thanks
Praveen

Re: Dynamic creation and destroying hadoop on Rackspace

Posted by Andrei Savu <sa...@gmail.com>.

Have you checked that the SOCKS proxy is working as expected? Maybe
there are some connectivity errors.

I have pasted bellow the relevant paragraph from the Quick Start Guide:

http://incubator.apache.org/whirr/quick-start-guide.html

-----------------
Run a proxy

For security reasons, traffic from the network your client is running
on is proxied through the master node of the cluster using an SSH
tunnel (a SOCKS proxy on port 6666).

A script to launch the proxy is created when you launch the cluster,
and may be found in ~/.whirr/<cluster-name>. Run it as a follows (in a
new terminal window):

% . ~/.whirr/myhadoopcluster/hadoop-proxy.sh
To stop the proxy, just kill the process with Ctrl-C.

Web browsers need to be configured to use this proxy too, so you can
view pages served by worker nodes in the cluster. The most convenient
way to do this is to use a proxy auto-config (PAC) file file, such as
this one [1] for Hadoop EC2 clusters.

If you are using Firefox, then you may find FoxyProxy [2] useful for
managing PAC files.

[1] http://apache-hadoop-ec2.s3.amazonaws.com/proxy.pac
[2] http://foxyproxy.mozdev.org/
-----------------

Hope you will find this helpful :)

On Mon, Jan 10, 2011 at 11:21 PM,  <pr...@nokia.com> wrote:
> Yes I have run the proxy script after the cluster has been started.
>
> [root@mymachine ~]# .whirr/relevancy2cluster/hadoop-proxy.sh
> Running proxy to Hadoop cluster at 184-106-158-27.static.cloud-ips.com. Use Ctrl-c to quit.
>
>
> Praveen
> -----Original Message-----
> From: olivier.grisel@gmail.com [mailto:olivier.grisel@gmail.com] On Behalf Of ext Olivier Grisel
> Sent: Monday, January 10, 2011 4:09 PM
> To: whirr-user@incubator.apache.org
> Cc: tom@cloudera.com; hammer@cloudera.com
> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>
> 2011/1/10  <pr...@nokia.com>:
>> Hi Tom,
>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>
>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs -lsr
>> / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is
>> deprecated. Instead use core-site.xml, mapred-site.xml and
>> hdfs-site.xml to override properties of core-default.xml,
>> mapred-default.xml and hdfs-default.xml respectively
>
> It looks as if the ssh tunnel is not active. Have you run the proxy shell script right after the cluster started message is displayed?
>
>  ~/.whirr/myhadoopcluster/hadoop-proxy.sh
>
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>

-- 
Andrei Savu -- andreisavu.ro

RE: Dynamic creation and destroying hadoop on Rackspace

Posted by pr...@nokia.com.

Yes I have run the proxy script after the cluster has been started.

[root@mymachine ~]# .whirr/relevancy2cluster/hadoop-proxy.sh 
Running proxy to Hadoop cluster at 184-106-158-27.static.cloud-ips.com. Use Ctrl-c to quit.

 
Praveen
-----Original Message-----
From: olivier.grisel@gmail.com [mailto:olivier.grisel@gmail.com] On Behalf Of ext Olivier Grisel
Sent: Monday, January 10, 2011 4:09 PM
To: whirr-user@incubator.apache.org
Cc: tom@cloudera.com; hammer@cloudera.com
Subject: Re: Dynamic creation and destroying hadoop on Rackspace

2011/1/10  <pr...@nokia.com>:
> Hi Tom,
> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>
> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs -lsr 
> / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED: 
> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is 
> deprecated. Instead use core-site.xml, mapred-site.xml and 
> hdfs-site.xml to override properties of core-default.xml, 
> mapred-default.xml and hdfs-default.xml respectively

It looks as if the ssh tunnel is not active. Have you run the proxy shell script right after the cluster started message is displayed?

  ~/.whirr/myhadoopcluster/hadoop-proxy.sh


--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Dynamic creation and destroying hadoop on Rackspace

Posted by Olivier Grisel <ol...@ensta.org>.

2011/1/10  <pr...@nokia.com>:
> Hi Tom,
> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>
> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs -lsr /
> 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively

It looks as if the ssh tunnel is not active. Have you run the proxy
shell script right after the cluster started message is displayed?

  ~/.whirr/myhadoopcluster/hadoop-proxy.sh


-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Dynamic creation and destroying hadoop on Rackspace

Posted by Tom White <to...@gmail.com>.

On Thu, Jan 13, 2011 at 2:42 PM,  <pr...@nokia.com> wrote:
> Thanks Tom.
>
> 1. Is there already a script with default values for these properties and I need to modify it and place it in web accessible folder?

Yes, look in the scripts directory in the distribution.

> 2. Does it have to be web accessible only.

Yes.

> Is it possible to have the script on local machine where I am launching the hadoop cluster?

Not yet, but we'd like to move to such an approach (e.g.
https://issues.apache.org/jira/browse/WHIRR-99).

> 3. Is this fix for Whirr 0.3.0 release time frame so I can avoid the dependency on a web server? Looks like hdfs properties can already be overwritten in hadoop.properties?

This would be supported by
https://issues.apache.org/jira/browse/WHIRR-55, but this may not make
0.3.0.

Hope that helps.

Cheers
Tom

>
> Praveen
>
> -----Original Message-----
> From: ext Tom White [mailto:tom.e.white@gmail.com]
> Sent: Thursday, January 13, 2011 5:27 PM
> To: whirr-user@incubator.apache.org
> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>
> You need to override these values in the scripts themselves until
> https://issues.apache.org/jira/browse/WHIRR-55 is done. See http://incubator.apache.org/whirr/faq.html#How_can_I_modify_the_instance_installation_and_configuration_scripts
>
> Cheers
> Tom
>
> On Thu, Jan 13, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>
>>  I was successfully able to launch hadoop cluster in rackspace dynamically but I am trying to figure out how to override the hadoop properties with different values. For example I want the following properties to have specific values that are optimal for our data. The default values that Whirr goes with are not suitable for us.
>>
>> mapred.reduce.tasks=24
>> mapred.map.tasks=64
>> mapred.child.java.opts=-Xmx2048m
>>
>> Praveen
>>
>> -----Original Message-----
>> From: Peddi Praveen (Nokia-MS/Boston)
>> Sent: Wednesday, January 12, 2011 8:00 PM
>> To: whirr-user@incubator.apache.org
>> Cc: tom@cloudera.com
>> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>>
>> Please ignore my previous email That is not an issue anymore. It turns out that the hadoop versions didn't match. When I installed the same hadoop version on my client it worked fine.
>>
>> Praveen
>> ________________________________________
>> From: Peddi Praveen (Nokia-MS/Boston)
>> Sent: Wednesday, January 12, 2011 7:04 PM
>> To: whirr-user@incubator.apache.org
>> Cc: tom@cloudera.com
>> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>>
>> Hi Andrei,
>> Thanks for your suggestion. I was able to  launch hadoop cluster using
>> Whirr from trunk. I can get to 50030 and 50070 on hadoop master so my
>> hadoop is running. However when I try to issue hadoop commands
>>
>> I started the proxy before running the hadoop command.
>>
>> root@relevancy-service:~# /usr/local/software/hadoop-0.20.2/bin/hadoop fs -lsr /11/01/13 00:02:32 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively Bad connection to FS. command aborted.
>>
>> Praveen
>>
>> ________________________________________
>> From: ext Andrei Savu [savu.andrei@gmail.com]
>> Sent: Tuesday, January 11, 2011 2:19 PM
>> To: whirr-user@incubator.apache.org
>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>
>> On Tue, Jan 11, 2011 at 7:24 PM,  <pr...@nokia.com> wrote:
>>> Another thing I noticed from whirr.log is below. Looks like its trying to change ownership to hadoop user but hadoop user doesn't exist on hadoop master. Am I missing anything?
>>
>> No. I have been able to replicate this using the same version you are using. I believe that you have found a bug in Whirr 0.2.0.
>>
>> I suggest that you should use the trunk version, it's stable and it works fine using the same properties file - I have tested it on rackspacecloud.
>>
>> Whirr 0.3.0 should be ready for release in a few weeks.
>>
>>>
>>> 2011-01-11 16:28:11,979 DEBUG [jclouds.compute] (user thread 3) <<
>>> stderr from runscript as root@xx.xx.xx.xx
>>> + DFS_DATA_DIR=/data/hadoop/hdfs/data
>>> + MAPRED_LOCAL_DIR=/data/hadoop/mapred/local
>>> + MAX_MAP_TASKS=2
>>> + MAX_REDUCE_TASKS=1
>>> + CHILD_OPTS=-Xmx550m
>>> + CHILD_ULIMIT=1126400
>>> + TMP_DIR='/data/tmp/hadoop-${user.name}'
>>> + mkdir -p /data/hadoop
>>> + chown hadoop:hadoop /data/hadoop
>>> chown: invalid user: `hadoop:hadoop'
>>>
>>> Praveen
>>>
>>> -----Original Message-----
>>> From: Peddi Praveen (Nokia-MS/Boston)
>>> Sent: Tuesday, January 11, 2011 11:27 AM
>>> To: whirr-user@incubator.apache.org; tom@cloudera.com
>>> Cc: hammer@cloudera.com
>>> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>>>
>>> whirr.hadoop-install-runurl=cloudera/cdh/install
>>> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>>>
>>> Does these two properties install all cloudera packages or just the hadoop from cloudera dist? I am thinking something could be wrong here...
>>>
>>> Praveen
>>>
>>> -----Original Message-----
>>> From: Peddi Praveen (Nokia-MS/Boston)
>>> Sent: Monday, January 10, 2011 7:22 PM
>>> To: tom@cloudera.com
>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>>>
>>> Here are the properties. Please note that I tried w/o specifying whirr.hadoop.install.runurl also and got the same problem.
>>>
>>> whirr.service-name=hadoop
>>> whirr.cluster-name=relevancycluster
>>> whirr.instance-templates=1 jt+nn,1 dn+tt whirr.provider=cloudservers
>>> whirr.identity=<rackspace-id>
>>> whirr.credential=<rckspace-api-password>
>>> #whirr.private-key-file=/home/hadoop/.ssh/id_rsa
>>> #whirr.public-key-file=/home/hadoop/.ssh/id_rsa.pub
>>>
>>> # Uncomment out these lines to run CDH
>>> whirr.hadoop-install-runurl=cloudera/cdh/install
>>> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>>>
>>> # The size of the instance to use. See
>>> http://www.rackspacecloud.com/cloud_hosting_products/serv$
>>> # id 3: 1GB, 1 virtual core
>>> # id 4: 2GB, 2 virtual cores
>>> # id 5: 4GB, 2 virtual cores
>>> # id 6: 8GB, 4 virtual cores
>>> # id 7: 15.5GB, 4 virtual cores
>>> whirr.hardware-id=4
>>> # Ubuntu 10.04 LTS Lucid
>>> whirr.image-id=49
>>>
>>> ________________________________________
>>> From: ext Tom White [tom@cloudera.com]
>>> Sent: Monday, January 10, 2011 7:03 PM
>>> To: Peddi Praveen (Nokia-MS/Boston)
>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>
>>> Can you post your Whirr properties file please (with credentials removed).
>>>
>>> Thanks
>>> Tom
>>>
>>> On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
>>>> I am using the latest Whirr. For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>>>>
>>>> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>>>>
>>>>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>>>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>>>>
>>>>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>>>>
>>>>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>>>>
>>>>>> --------------------------------
>>>>>> root@hadoop-master:~#
>>>>>> /etc/alternatives/hadoop-lib/bin/start-all.sh
>>>>>> starting namenode, logging to
>>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-
>>>>>> 1 0 6-96-62.static.cloud-ips.com.out May not run daemons as root.
>>>>>> Please specify HADOOP_NAMENODE_USER
>>>>>
>>>>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>>>>
>>>>> Tom
>>>>>
>>>>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>>>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>>>>> Are you sure you want to continue connecting (yes/no)? yes
>>>>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>>>>> localhost: starting datanode, logging to
>>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-
>>>>>> 1
>>>>>> 0
>>>>>> 6-96-62.static.cloud-ips.com.out
>>>>>> localhost: May not run daemons as root. Please specify
>>>>>> HADOOP_DATANODE_USER
>>>>>> localhost: starting secondarynamenode, logging to
>>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondaryname
>>>>>> n o de-184-106-96-62.static.cloud-ips.com.out
>>>>>> localhost: May not run daemons as root. Please specify
>>>>>> HADOOP_SECONDARYNAMENODE_USER starting jobtracker, logging to
>>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-18
>>>>>> 4
>>>>>> -
>>>>>> 106-96-62.static.cloud-ips.com.out
>>>>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>>>>> localhost: starting tasktracker, logging to
>>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-1
>>>>>> 8
>>>>>> 4 -106-96-62.static.cloud-ips.com.out
>>>>>> localhost: May not run daemons as root. Please specify
>>>>>> HADOOP_TASKTRACKER_USER
>>>>>> --------------------------------
>>>>>>
>>>>>> Praveen
>>>>>> -----Original Message-----
>>>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>>>> Sent: Monday, January 10, 2011 5:08 PM
>>>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>>>
>>>>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>>>>
>>>>>> Tom
>>>>>>
>>>>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>>>>> Hi Tom,
>>>>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>>>>
>>>>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs
>>>>>>> -lsr / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>>>>
>>>>>>> I should say Whirr is cool so far!
>>>>>>>
>>>>>>> Thanks again
>>>>>>> Praveen
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>>>>
>>>>>>> Hi Praveen,
>>>>>>>
>>>>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>>>>
>>>>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/ha
>>>>>>> d
>>>>>>> o
>>>>>>> op/
>>>>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopS
>>>>>>> e
>>>>>>> r
>>>>>>> vic
>>>>>>> eController.java
>>>>>>>
>>>>>>> Finally, check out the recipes for advice on setting
>>>>>>> configuration for
>>>>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Tom
>>>>>>>
>>>>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>>>>> Hello all,
>>>>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>>>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>>>>>> laborious process (especially given that we have about 10
>>>>>>>> environments that we need to configure). So my question is about
>>>>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>>>>> Here is my current deployment.
>>>>>>>>
>>>>>>>> Glassfish (Node 1)
>>>>>>>> Mysql (Node 2)
>>>>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>>>>
>>>>>>>> We can install Glassfish and MySql manually but we would like to
>>>>>>>> dynamically create/install hadoop cluster, start the servers,
>>>>>>>> run jobs and then destroy cluster on the cloud. Primary purpose
>>>>>>>> of doing this is to make deployment easy and save costs. Since
>>>>>>>> the jobs are run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>>>>
>>>>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr
>>>>>>>> and he was positive that I can do the above steps using Whirr.
>>>>>>>> Has anyone done this using Whirr on Rackspace. I could not find
>>>>>>>> any examples on how to dynamically install Hadoop cluster on
>>>>>>>> Rackspace. Any information on this task would be greatly appreciated.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Praveen
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Andrei Savu -- andreisavu.ro
>>
>

RE: Dynamic creation and destroying hadoop on Rackspace

Posted by pr...@nokia.com.

Thanks Tom.

1. Is there already a script with default values for these properties and I need to modify it and place it in web accessible folder?
2. Does it have to be web accessible only. Is it possible to have the script on local machine where I am launching the hadoop cluster?
3. Is this fix for Whirr 0.3.0 release time frame so I can avoid the dependency on a web server? Looks like hdfs properties can already be overwritten in hadoop.properties?

Praveen

-----Original Message-----
From: ext Tom White [mailto:tom.e.white@gmail.com]
Sent: Thursday, January 13, 2011 5:27 PM
To: whirr-user@incubator.apache.org
Subject: Re: Dynamic creation and destroying hadoop on Rackspace

You need to override these values in the scripts themselves until
https://issues.apache.org/jira/browse/WHIRR-55 is done. See http://incubator.apache.org/whirr/faq.html#How_can_I_modify_the_instance_installation_and_configuration_scripts

Cheers
Tom

On Thu, Jan 13, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>
>  I was successfully able to launch hadoop cluster in rackspace dynamically but I am trying to figure out how to override the hadoop properties with different values. For example I want the following properties to have specific values that are optimal for our data. The default values that Whirr goes with are not suitable for us.
>
> mapred.reduce.tasks=24
> mapred.map.tasks=64
> mapred.child.java.opts=-Xmx2048m
>
> Praveen
>
> -----Original Message-----
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Wednesday, January 12, 2011 8:00 PM
> To: whirr-user@incubator.apache.org
> Cc: tom@cloudera.com
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> Please ignore my previous email That is not an issue anymore. It turns out that the hadoop versions didn't match. When I installed the same hadoop version on my client it worked fine.
>
> Praveen
> ________________________________________
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Wednesday, January 12, 2011 7:04 PM
> To: whirr-user@incubator.apache.org
> Cc: tom@cloudera.com
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> Hi Andrei,
> Thanks for your suggestion. I was able to  launch hadoop cluster using
> Whirr from trunk. I can get to 50030 and 50070 on hadoop master so my
> hadoop is running. However when I try to issue hadoop commands
>
> I started the proxy before running the hadoop command.
>
> root@relevancy-service:~# /usr/local/software/hadoop-0.20.2/bin/hadoop fs -lsr /11/01/13 00:02:32 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively Bad connection to FS. command aborted.
>
> Praveen
>
> ________________________________________
> From: ext Andrei Savu [savu.andrei@gmail.com]
> Sent: Tuesday, January 11, 2011 2:19 PM
> To: whirr-user@incubator.apache.org
> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>
> On Tue, Jan 11, 2011 at 7:24 PM,  <pr...@nokia.com> wrote:
>> Another thing I noticed from whirr.log is below. Looks like its trying to change ownership to hadoop user but hadoop user doesn't exist on hadoop master. Am I missing anything?
>
> No. I have been able to replicate this using the same version you are using. I believe that you have found a bug in Whirr 0.2.0.
>
> I suggest that you should use the trunk version, it's stable and it works fine using the same properties file - I have tested it on rackspacecloud.
>
> Whirr 0.3.0 should be ready for release in a few weeks.
>
>>
>> 2011-01-11 16:28:11,979 DEBUG [jclouds.compute] (user thread 3) <<
>> stderr from runscript as root@xx.xx.xx.xx
>> + DFS_DATA_DIR=/data/hadoop/hdfs/data
>> + MAPRED_LOCAL_DIR=/data/hadoop/mapred/local
>> + MAX_MAP_TASKS=2
>> + MAX_REDUCE_TASKS=1
>> + CHILD_OPTS=-Xmx550m
>> + CHILD_ULIMIT=1126400
>> + TMP_DIR='/data/tmp/hadoop-${user.name}'
>> + mkdir -p /data/hadoop
>> + chown hadoop:hadoop /data/hadoop
>> chown: invalid user: `hadoop:hadoop'
>>
>> Praveen
>>
>> -----Original Message-----
>> From: Peddi Praveen (Nokia-MS/Boston)
>> Sent: Tuesday, January 11, 2011 11:27 AM
>> To: whirr-user@incubator.apache.org; tom@cloudera.com
>> Cc: hammer@cloudera.com
>> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>>
>> whirr.hadoop-install-runurl=cloudera/cdh/install
>> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>>
>> Does these two properties install all cloudera packages or just the hadoop from cloudera dist? I am thinking something could be wrong here...
>>
>> Praveen
>>
>> -----Original Message-----
>> From: Peddi Praveen (Nokia-MS/Boston)
>> Sent: Monday, January 10, 2011 7:22 PM
>> To: tom@cloudera.com
>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>>
>> Here are the properties. Please note that I tried w/o specifying whirr.hadoop.install.runurl also and got the same problem.
>>
>> whirr.service-name=hadoop
>> whirr.cluster-name=relevancycluster
>> whirr.instance-templates=1 jt+nn,1 dn+tt whirr.provider=cloudservers
>> whirr.identity=<rackspace-id>
>> whirr.credential=<rckspace-api-password>
>> #whirr.private-key-file=/home/hadoop/.ssh/id_rsa
>> #whirr.public-key-file=/home/hadoop/.ssh/id_rsa.pub
>>
>> # Uncomment out these lines to run CDH
>> whirr.hadoop-install-runurl=cloudera/cdh/install
>> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>>
>> # The size of the instance to use. See
>> http://www.rackspacecloud.com/cloud_hosting_products/serv$
>> # id 3: 1GB, 1 virtual core
>> # id 4: 2GB, 2 virtual cores
>> # id 5: 4GB, 2 virtual cores
>> # id 6: 8GB, 4 virtual cores
>> # id 7: 15.5GB, 4 virtual cores
>> whirr.hardware-id=4
>> # Ubuntu 10.04 LTS Lucid
>> whirr.image-id=49
>>
>> ________________________________________
>> From: ext Tom White [tom@cloudera.com]
>> Sent: Monday, January 10, 2011 7:03 PM
>> To: Peddi Praveen (Nokia-MS/Boston)
>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>
>> Can you post your Whirr properties file please (with credentials removed).
>>
>> Thanks
>> Tom
>>
>> On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
>>> I am using the latest Whirr. For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>>>
>>> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>>>
>>>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>>>
>>>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>>>
>>>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>>>
>>>>> --------------------------------
>>>>> root@hadoop-master:~#
>>>>> /etc/alternatives/hadoop-lib/bin/start-all.sh
>>>>> starting namenode, logging to
>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-
>>>>> 1 0 6-96-62.static.cloud-ips.com.out May not run daemons as root.
>>>>> Please specify HADOOP_NAMENODE_USER
>>>>
>>>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>>>
>>>> Tom
>>>>
>>>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>>>> Are you sure you want to continue connecting (yes/no)? yes
>>>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>>>> localhost: starting datanode, logging to
>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-
>>>>> 1
>>>>> 0
>>>>> 6-96-62.static.cloud-ips.com.out
>>>>> localhost: May not run daemons as root. Please specify
>>>>> HADOOP_DATANODE_USER
>>>>> localhost: starting secondarynamenode, logging to
>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondaryname
>>>>> n o de-184-106-96-62.static.cloud-ips.com.out
>>>>> localhost: May not run daemons as root. Please specify
>>>>> HADOOP_SECONDARYNAMENODE_USER starting jobtracker, logging to
>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-18
>>>>> 4
>>>>> -
>>>>> 106-96-62.static.cloud-ips.com.out
>>>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>>>> localhost: starting tasktracker, logging to
>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-1
>>>>> 8
>>>>> 4 -106-96-62.static.cloud-ips.com.out
>>>>> localhost: May not run daemons as root. Please specify
>>>>> HADOOP_TASKTRACKER_USER
>>>>> --------------------------------
>>>>>
>>>>> Praveen
>>>>> -----Original Message-----
>>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>>> Sent: Monday, January 10, 2011 5:08 PM
>>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>>
>>>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>>>
>>>>> Tom
>>>>>
>>>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>>>> Hi Tom,
>>>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>>>
>>>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs
>>>>>> -lsr / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>>>
>>>>>> I should say Whirr is cool so far!
>>>>>>
>>>>>> Thanks again
>>>>>> Praveen
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>>>
>>>>>> Hi Praveen,
>>>>>>
>>>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>>>
>>>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/ha
>>>>>> d
>>>>>> o
>>>>>> op/
>>>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopS
>>>>>> e
>>>>>> r
>>>>>> vic
>>>>>> eController.java
>>>>>>
>>>>>> Finally, check out the recipes for advice on setting
>>>>>> configuration for
>>>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>>>
>>>>>> Thanks,
>>>>>> Tom
>>>>>>
>>>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>>>> Hello all,
>>>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>>>>> laborious process (especially given that we have about 10
>>>>>>> environments that we need to configure). So my question is about
>>>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>>>> Here is my current deployment.
>>>>>>>
>>>>>>> Glassfish (Node 1)
>>>>>>> Mysql (Node 2)
>>>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>>>
>>>>>>> We can install Glassfish and MySql manually but we would like to
>>>>>>> dynamically create/install hadoop cluster, start the servers,
>>>>>>> run jobs and then destroy cluster on the cloud. Primary purpose
>>>>>>> of doing this is to make deployment easy and save costs. Since
>>>>>>> the jobs are run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>>>
>>>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr
>>>>>>> and he was positive that I can do the above steps using Whirr.
>>>>>>> Has anyone done this using Whirr on Rackspace. I could not find
>>>>>>> any examples on how to dynamically install Hadoop cluster on
>>>>>>> Rackspace. Any information on this task would be greatly appreciated.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Praveen
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>
>
>
> --
> Andrei Savu -- andreisavu.ro
>

Re: Dynamic creation and destroying hadoop on Rackspace

Posted by Tom White <to...@gmail.com>.

You need to override these values in the scripts themselves until
https://issues.apache.org/jira/browse/WHIRR-55 is done. See
http://incubator.apache.org/whirr/faq.html#How_can_I_modify_the_instance_installation_and_configuration_scripts

Cheers
Tom

On Thu, Jan 13, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>
>  I was successfully able to launch hadoop cluster in rackspace dynamically but I am trying to figure out how to override the hadoop properties with different values. For example I want the following properties to have specific values that are optimal for our data. The default values that Whirr goes with are not suitable for us.
>
> mapred.reduce.tasks=24
> mapred.map.tasks=64
> mapred.child.java.opts=-Xmx2048m
>
> Praveen
>
> -----Original Message-----
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Wednesday, January 12, 2011 8:00 PM
> To: whirr-user@incubator.apache.org
> Cc: tom@cloudera.com
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> Please ignore my previous email That is not an issue anymore. It turns out that the hadoop versions didn't match. When I installed the same hadoop version on my client it worked fine.
>
> Praveen
> ________________________________________
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Wednesday, January 12, 2011 7:04 PM
> To: whirr-user@incubator.apache.org
> Cc: tom@cloudera.com
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> Hi Andrei,
> Thanks for your suggestion. I was able to  launch hadoop cluster using Whirr from trunk. I can get to 50030 and 50070 on hadoop master so my hadoop is running. However when I try to issue hadoop commands
>
> I started the proxy before running the hadoop command.
>
> root@relevancy-service:~# /usr/local/software/hadoop-0.20.2/bin/hadoop fs -lsr /11/01/13 00:02:32 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively Bad connection to FS. command aborted.
>
> Praveen
>
> ________________________________________
> From: ext Andrei Savu [savu.andrei@gmail.com]
> Sent: Tuesday, January 11, 2011 2:19 PM
> To: whirr-user@incubator.apache.org
> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>
> On Tue, Jan 11, 2011 at 7:24 PM,  <pr...@nokia.com> wrote:
>> Another thing I noticed from whirr.log is below. Looks like its trying to change ownership to hadoop user but hadoop user doesn't exist on hadoop master. Am I missing anything?
>
> No. I have been able to replicate this using the same version you are using. I believe that you have found a bug in Whirr 0.2.0.
>
> I suggest that you should use the trunk version, it's stable and it works fine using the same properties file - I have tested it on rackspacecloud.
>
> Whirr 0.3.0 should be ready for release in a few weeks.
>
>>
>> 2011-01-11 16:28:11,979 DEBUG [jclouds.compute] (user thread 3) <<
>> stderr from runscript as root@xx.xx.xx.xx
>> + DFS_DATA_DIR=/data/hadoop/hdfs/data
>> + MAPRED_LOCAL_DIR=/data/hadoop/mapred/local
>> + MAX_MAP_TASKS=2
>> + MAX_REDUCE_TASKS=1
>> + CHILD_OPTS=-Xmx550m
>> + CHILD_ULIMIT=1126400
>> + TMP_DIR='/data/tmp/hadoop-${user.name}'
>> + mkdir -p /data/hadoop
>> + chown hadoop:hadoop /data/hadoop
>> chown: invalid user: `hadoop:hadoop'
>>
>> Praveen
>>
>> -----Original Message-----
>> From: Peddi Praveen (Nokia-MS/Boston)
>> Sent: Tuesday, January 11, 2011 11:27 AM
>> To: whirr-user@incubator.apache.org; tom@cloudera.com
>> Cc: hammer@cloudera.com
>> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>>
>> whirr.hadoop-install-runurl=cloudera/cdh/install
>> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>>
>> Does these two properties install all cloudera packages or just the hadoop from cloudera dist? I am thinking something could be wrong here...
>>
>> Praveen
>>
>> -----Original Message-----
>> From: Peddi Praveen (Nokia-MS/Boston)
>> Sent: Monday, January 10, 2011 7:22 PM
>> To: tom@cloudera.com
>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>>
>> Here are the properties. Please note that I tried w/o specifying whirr.hadoop.install.runurl also and got the same problem.
>>
>> whirr.service-name=hadoop
>> whirr.cluster-name=relevancycluster
>> whirr.instance-templates=1 jt+nn,1 dn+tt whirr.provider=cloudservers
>> whirr.identity=<rackspace-id> whirr.credential=<rckspace-api-password>
>> #whirr.private-key-file=/home/hadoop/.ssh/id_rsa
>> #whirr.public-key-file=/home/hadoop/.ssh/id_rsa.pub
>>
>> # Uncomment out these lines to run CDH
>> whirr.hadoop-install-runurl=cloudera/cdh/install
>> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>>
>> # The size of the instance to use. See
>> http://www.rackspacecloud.com/cloud_hosting_products/serv$
>> # id 3: 1GB, 1 virtual core
>> # id 4: 2GB, 2 virtual cores
>> # id 5: 4GB, 2 virtual cores
>> # id 6: 8GB, 4 virtual cores
>> # id 7: 15.5GB, 4 virtual cores
>> whirr.hardware-id=4
>> # Ubuntu 10.04 LTS Lucid
>> whirr.image-id=49
>>
>> ________________________________________
>> From: ext Tom White [tom@cloudera.com]
>> Sent: Monday, January 10, 2011 7:03 PM
>> To: Peddi Praveen (Nokia-MS/Boston)
>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>
>> Can you post your Whirr properties file please (with credentials removed).
>>
>> Thanks
>> Tom
>>
>> On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
>>> I am using the latest Whirr. For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>>>
>>> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>>>
>>>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>>>
>>>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>>>
>>>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>>>
>>>>> --------------------------------
>>>>> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>>>>> starting namenode, logging to
>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-1
>>>>> 0 6-96-62.static.cloud-ips.com.out May not run daemons as root.
>>>>> Please specify HADOOP_NAMENODE_USER
>>>>
>>>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>>>
>>>> Tom
>>>>
>>>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>>>> Are you sure you want to continue connecting (yes/no)? yes
>>>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>>>> localhost: starting datanode, logging to
>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-1
>>>>> 0
>>>>> 6-96-62.static.cloud-ips.com.out
>>>>> localhost: May not run daemons as root. Please specify
>>>>> HADOOP_DATANODE_USER
>>>>> localhost: starting secondarynamenode, logging to
>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynamen
>>>>> o de-184-106-96-62.static.cloud-ips.com.out
>>>>> localhost: May not run daemons as root. Please specify
>>>>> HADOOP_SECONDARYNAMENODE_USER starting jobtracker, logging to
>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184
>>>>> -
>>>>> 106-96-62.static.cloud-ips.com.out
>>>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>>>> localhost: starting tasktracker, logging to
>>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-18
>>>>> 4 -106-96-62.static.cloud-ips.com.out
>>>>> localhost: May not run daemons as root. Please specify
>>>>> HADOOP_TASKTRACKER_USER
>>>>> --------------------------------
>>>>>
>>>>> Praveen
>>>>> -----Original Message-----
>>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>>> Sent: Monday, January 10, 2011 5:08 PM
>>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>>
>>>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>>>
>>>>> Tom
>>>>>
>>>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>>>> Hi Tom,
>>>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>>>
>>>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs
>>>>>> -lsr / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>>>
>>>>>> I should say Whirr is cool so far!
>>>>>>
>>>>>> Thanks again
>>>>>> Praveen
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>>>
>>>>>> Hi Praveen,
>>>>>>
>>>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>>>
>>>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/had
>>>>>> o
>>>>>> op/
>>>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopSe
>>>>>> r
>>>>>> vic
>>>>>> eController.java
>>>>>>
>>>>>> Finally, check out the recipes for advice on setting configuration
>>>>>> for
>>>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>>>
>>>>>> Thanks,
>>>>>> Tom
>>>>>>
>>>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>>>> Hello all,
>>>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>>>>> laborious process (especially given that we have about 10
>>>>>>> environments that we need to configure). So my question is about
>>>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>>>> Here is my current deployment.
>>>>>>>
>>>>>>> Glassfish (Node 1)
>>>>>>> Mysql (Node 2)
>>>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>>>
>>>>>>> We can install Glassfish and MySql manually but we would like to
>>>>>>> dynamically create/install hadoop cluster, start the servers, run
>>>>>>> jobs and then destroy cluster on the cloud. Primary purpose of
>>>>>>> doing this is to make deployment easy and save costs. Since the
>>>>>>> jobs are run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>>>
>>>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and
>>>>>>> he was positive that I can do the above steps using Whirr. Has
>>>>>>> anyone done this using Whirr on Rackspace. I could not find any
>>>>>>> examples on how to dynamically install Hadoop cluster on
>>>>>>> Rackspace. Any information on this task would be greatly appreciated.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Praveen
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>
>
>
> --
> Andrei Savu -- andreisavu.ro
>

RE: Dynamic creation and destroying hadoop on Rackspace

Posted by pr...@nokia.com.

 I was successfully able to launch hadoop cluster in rackspace dynamically but I am trying to figure out how to override the hadoop properties with different values. For example I want the following properties to have specific values that are optimal for our data. The default values that Whirr goes with are not suitable for us.

mapred.reduce.tasks=24
mapred.map.tasks=64
mapred.child.java.opts=-Xmx2048m

Praveen

-----Original Message-----
From: Peddi Praveen (Nokia-MS/Boston)
Sent: Wednesday, January 12, 2011 8:00 PM
To: whirr-user@incubator.apache.org
Cc: tom@cloudera.com
Subject: RE: Dynamic creation and destroying hadoop on Rackspace

Please ignore my previous email That is not an issue anymore. It turns out that the hadoop versions didn't match. When I installed the same hadoop version on my client it worked fine.

Praveen
________________________________________
From: Peddi Praveen (Nokia-MS/Boston)
Sent: Wednesday, January 12, 2011 7:04 PM
To: whirr-user@incubator.apache.org
Cc: tom@cloudera.com
Subject: RE: Dynamic creation and destroying hadoop on Rackspace

Hi Andrei,
Thanks for your suggestion. I was able to  launch hadoop cluster using Whirr from trunk. I can get to 50030 and 50070 on hadoop master so my hadoop is running. However when I try to issue hadoop commands

I started the proxy before running the hadoop command.

root@relevancy-service:~# /usr/local/software/hadoop-0.20.2/bin/hadoop fs -lsr /11/01/13 00:02:32 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively Bad connection to FS. command aborted.

Praveen

________________________________________
From: ext Andrei Savu [savu.andrei@gmail.com]
Sent: Tuesday, January 11, 2011 2:19 PM
To: whirr-user@incubator.apache.org
Subject: Re: Dynamic creation and destroying hadoop on Rackspace

On Tue, Jan 11, 2011 at 7:24 PM,  <pr...@nokia.com> wrote:
> Another thing I noticed from whirr.log is below. Looks like its trying to change ownership to hadoop user but hadoop user doesn't exist on hadoop master. Am I missing anything?

No. I have been able to replicate this using the same version you are using. I believe that you have found a bug in Whirr 0.2.0.

I suggest that you should use the trunk version, it's stable and it works fine using the same properties file - I have tested it on rackspacecloud.

Whirr 0.3.0 should be ready for release in a few weeks.

>
> 2011-01-11 16:28:11,979 DEBUG [jclouds.compute] (user thread 3) <<
> stderr from runscript as root@xx.xx.xx.xx
> + DFS_DATA_DIR=/data/hadoop/hdfs/data
> + MAPRED_LOCAL_DIR=/data/hadoop/mapred/local
> + MAX_MAP_TASKS=2
> + MAX_REDUCE_TASKS=1
> + CHILD_OPTS=-Xmx550m
> + CHILD_ULIMIT=1126400
> + TMP_DIR='/data/tmp/hadoop-${user.name}'
> + mkdir -p /data/hadoop
> + chown hadoop:hadoop /data/hadoop
> chown: invalid user: `hadoop:hadoop'
>
> Praveen
>
> -----Original Message-----
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Tuesday, January 11, 2011 11:27 AM
> To: whirr-user@incubator.apache.org; tom@cloudera.com
> Cc: hammer@cloudera.com
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> whirr.hadoop-install-runurl=cloudera/cdh/install
> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>
> Does these two properties install all cloudera packages or just the hadoop from cloudera dist? I am thinking something could be wrong here...
>
> Praveen
>
> -----Original Message-----
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Monday, January 10, 2011 7:22 PM
> To: tom@cloudera.com
> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> Here are the properties. Please note that I tried w/o specifying whirr.hadoop.install.runurl also and got the same problem.
>
> whirr.service-name=hadoop
> whirr.cluster-name=relevancycluster
> whirr.instance-templates=1 jt+nn,1 dn+tt whirr.provider=cloudservers
> whirr.identity=<rackspace-id> whirr.credential=<rckspace-api-password>
> #whirr.private-key-file=/home/hadoop/.ssh/id_rsa
> #whirr.public-key-file=/home/hadoop/.ssh/id_rsa.pub
>
> # Uncomment out these lines to run CDH
> whirr.hadoop-install-runurl=cloudera/cdh/install
> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>
> # The size of the instance to use. See
> http://www.rackspacecloud.com/cloud_hosting_products/serv$
> # id 3: 1GB, 1 virtual core
> # id 4: 2GB, 2 virtual cores
> # id 5: 4GB, 2 virtual cores
> # id 6: 8GB, 4 virtual cores
> # id 7: 15.5GB, 4 virtual cores
> whirr.hardware-id=4
> # Ubuntu 10.04 LTS Lucid
> whirr.image-id=49
>
> ________________________________________
> From: ext Tom White [tom@cloudera.com]
> Sent: Monday, January 10, 2011 7:03 PM
> To: Peddi Praveen (Nokia-MS/Boston)
> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>
> Can you post your Whirr properties file please (with credentials removed).
>
> Thanks
> Tom
>
> On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
>> I am using the latest Whirr. For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>>
>> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>>
>>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>>
>>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>>
>>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>>
>>>> --------------------------------
>>>> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>>>> starting namenode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-1
>>>> 0 6-96-62.static.cloud-ips.com.out May not run daemons as root.
>>>> Please specify HADOOP_NAMENODE_USER
>>>
>>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>>
>>> Tom
>>>
>>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>>> Are you sure you want to continue connecting (yes/no)? yes
>>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>>> localhost: starting datanode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-1
>>>> 0
>>>> 6-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_DATANODE_USER
>>>> localhost: starting secondarynamenode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynamen
>>>> o de-184-106-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_SECONDARYNAMENODE_USER starting jobtracker, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184
>>>> -
>>>> 106-96-62.static.cloud-ips.com.out
>>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>>> localhost: starting tasktracker, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-18
>>>> 4 -106-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_TASKTRACKER_USER
>>>> --------------------------------
>>>>
>>>> Praveen
>>>> -----Original Message-----
>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>> Sent: Monday, January 10, 2011 5:08 PM
>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>
>>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>>
>>>> Tom
>>>>
>>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>>> Hi Tom,
>>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>>
>>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs
>>>>> -lsr / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>>
>>>>> I should say Whirr is cool so far!
>>>>>
>>>>> Thanks again
>>>>> Praveen
>>>>>
>>>>> -----Original Message-----
>>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>>
>>>>> Hi Praveen,
>>>>>
>>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>>
>>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/had
>>>>> o
>>>>> op/
>>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopSe
>>>>> r
>>>>> vic
>>>>> eController.java
>>>>>
>>>>> Finally, check out the recipes for advice on setting configuration
>>>>> for
>>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>>
>>>>> Thanks,
>>>>> Tom
>>>>>
>>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>>> Hello all,
>>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>>>> laborious process (especially given that we have about 10
>>>>>> environments that we need to configure). So my question is about
>>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>>> Here is my current deployment.
>>>>>>
>>>>>> Glassfish (Node 1)
>>>>>> Mysql (Node 2)
>>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>>
>>>>>> We can install Glassfish and MySql manually but we would like to
>>>>>> dynamically create/install hadoop cluster, start the servers, run
>>>>>> jobs and then destroy cluster on the cloud. Primary purpose of
>>>>>> doing this is to make deployment easy and save costs. Since the
>>>>>> jobs are run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>>
>>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and
>>>>>> he was positive that I can do the above steps using Whirr. Has
>>>>>> anyone done this using Whirr on Rackspace. I could not find any
>>>>>> examples on how to dynamically install Hadoop cluster on
>>>>>> Rackspace. Any information on this task would be greatly appreciated.
>>>>>>
>>>>>> Thanks
>>>>>> Praveen
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>



--
Andrei Savu -- andreisavu.ro

RE: Dynamic creation and destroying hadoop on Rackspace

Posted by pr...@nokia.com.

Please ignore my previous email That is not an issue anymore. It turns out that the hadoop versions didn't match. When I installed the same hadoop version on my client it worked fine.

Praveen
________________________________________
From: Peddi Praveen (Nokia-MS/Boston)
Sent: Wednesday, January 12, 2011 7:04 PM
To: whirr-user@incubator.apache.org
Cc: tom@cloudera.com
Subject: RE: Dynamic creation and destroying hadoop on Rackspace

Hi Andrei,
Thanks for your suggestion. I was able to  launch hadoop cluster using Whirr from trunk. I can get to 50030 and 50070 on hadoop master so my hadoop is running. However when I try to issue hadoop commands

I started the proxy before running the hadoop command.

root@relevancy-service:~# /usr/local/software/hadoop-0.20.2/bin/hadoop fs -lsr /11/01/13 00:02:32 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
Bad connection to FS. command aborted.

Praveen

________________________________________
From: ext Andrei Savu [savu.andrei@gmail.com]
Sent: Tuesday, January 11, 2011 2:19 PM
To: whirr-user@incubator.apache.org
Subject: Re: Dynamic creation and destroying hadoop on Rackspace

On Tue, Jan 11, 2011 at 7:24 PM,  <pr...@nokia.com> wrote:
> Another thing I noticed from whirr.log is below. Looks like its trying to change ownership to hadoop user but hadoop user doesn't exist on hadoop master. Am I missing anything?

No. I have been able to replicate this using the same version you are
using. I believe that you have found a bug in Whirr 0.2.0.

I suggest that you should use the trunk version, it's stable and it
works fine using the same properties file - I have tested it on
rackspacecloud.

Whirr 0.3.0 should be ready for release in a few weeks.

>
> 2011-01-11 16:28:11,979 DEBUG [jclouds.compute] (user thread 3) << stderr from runscript as root@xx.xx.xx.xx
> + DFS_DATA_DIR=/data/hadoop/hdfs/data
> + MAPRED_LOCAL_DIR=/data/hadoop/mapred/local
> + MAX_MAP_TASKS=2
> + MAX_REDUCE_TASKS=1
> + CHILD_OPTS=-Xmx550m
> + CHILD_ULIMIT=1126400
> + TMP_DIR='/data/tmp/hadoop-${user.name}'
> + mkdir -p /data/hadoop
> + chown hadoop:hadoop /data/hadoop
> chown: invalid user: `hadoop:hadoop'
>
> Praveen
>
> -----Original Message-----
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Tuesday, January 11, 2011 11:27 AM
> To: whirr-user@incubator.apache.org; tom@cloudera.com
> Cc: hammer@cloudera.com
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> whirr.hadoop-install-runurl=cloudera/cdh/install
> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>
> Does these two properties install all cloudera packages or just the hadoop from cloudera dist? I am thinking something could be wrong here...
>
> Praveen
>
> -----Original Message-----
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Monday, January 10, 2011 7:22 PM
> To: tom@cloudera.com
> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> Here are the properties. Please note that I tried w/o specifying whirr.hadoop.install.runurl also and got the same problem.
>
> whirr.service-name=hadoop
> whirr.cluster-name=relevancycluster
> whirr.instance-templates=1 jt+nn,1 dn+tt whirr.provider=cloudservers whirr.identity=<rackspace-id> whirr.credential=<rckspace-api-password>
> #whirr.private-key-file=/home/hadoop/.ssh/id_rsa
> #whirr.public-key-file=/home/hadoop/.ssh/id_rsa.pub
>
> # Uncomment out these lines to run CDH
> whirr.hadoop-install-runurl=cloudera/cdh/install
> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>
> # The size of the instance to use. See http://www.rackspacecloud.com/cloud_hosting_products/serv$
> # id 3: 1GB, 1 virtual core
> # id 4: 2GB, 2 virtual cores
> # id 5: 4GB, 2 virtual cores
> # id 6: 8GB, 4 virtual cores
> # id 7: 15.5GB, 4 virtual cores
> whirr.hardware-id=4
> # Ubuntu 10.04 LTS Lucid
> whirr.image-id=49
>
> ________________________________________
> From: ext Tom White [tom@cloudera.com]
> Sent: Monday, January 10, 2011 7:03 PM
> To: Peddi Praveen (Nokia-MS/Boston)
> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>
> Can you post your Whirr properties file please (with credentials removed).
>
> Thanks
> Tom
>
> On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
>> I am using the latest Whirr. For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>>
>> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>>
>>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>>
>>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>>
>>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>>
>>>> --------------------------------
>>>> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>>>> starting namenode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-10
>>>> 6-96-62.static.cloud-ips.com.out May not run daemons as root. Please
>>>> specify HADOOP_NAMENODE_USER
>>>
>>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>>
>>> Tom
>>>
>>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>>> Are you sure you want to continue connecting (yes/no)? yes
>>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>>> localhost: starting datanode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-10
>>>> 6-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_DATANODE_USER
>>>> localhost: starting secondarynamenode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynameno
>>>> de-184-106-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_SECONDARYNAMENODE_USER starting jobtracker, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-
>>>> 106-96-62.static.cloud-ips.com.out
>>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>>> localhost: starting tasktracker, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184
>>>> -106-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_TASKTRACKER_USER
>>>> --------------------------------
>>>>
>>>> Praveen
>>>> -----Original Message-----
>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>> Sent: Monday, January 10, 2011 5:08 PM
>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>
>>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>>
>>>> Tom
>>>>
>>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>>> Hi Tom,
>>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>>
>>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs
>>>>> -lsr / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>>
>>>>> I should say Whirr is cool so far!
>>>>>
>>>>> Thanks again
>>>>> Praveen
>>>>>
>>>>> -----Original Message-----
>>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>>
>>>>> Hi Praveen,
>>>>>
>>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>>
>>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hado
>>>>> op/
>>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopSer
>>>>> vic
>>>>> eController.java
>>>>>
>>>>> Finally, check out the recipes for advice on setting configuration
>>>>> for
>>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>>
>>>>> Thanks,
>>>>> Tom
>>>>>
>>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>>> Hello all,
>>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>>>> laborious process (especially given that we have about 10
>>>>>> environments that we need to configure). So my question is about
>>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>>> Here is my current deployment.
>>>>>>
>>>>>> Glassfish (Node 1)
>>>>>> Mysql (Node 2)
>>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>>
>>>>>> We can install Glassfish and MySql manually but we would like to
>>>>>> dynamically create/install hadoop cluster, start the servers, run
>>>>>> jobs and then destroy cluster on the cloud. Primary purpose of
>>>>>> doing this is to make deployment easy and save costs. Since the
>>>>>> jobs are run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>>
>>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and
>>>>>> he was positive that I can do the above steps using Whirr. Has
>>>>>> anyone done this using Whirr on Rackspace. I could not find any
>>>>>> examples on how to dynamically install Hadoop cluster on
>>>>>> Rackspace. Any information on this task would be greatly appreciated.
>>>>>>
>>>>>> Thanks
>>>>>> Praveen
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>



--
Andrei Savu -- andreisavu.ro

RE: Dynamic creation and destroying hadoop on Rackspace

Posted by pr...@nokia.com.

Hi Andrei,
Thanks for your suggestion. I was able to  launch hadoop cluster using Whirr from trunk. I can get to 50030 and 50070 on hadoop master so my hadoop is running. However when I try to issue hadoop commands 

I started the proxy before running the hadoop command.

root@relevancy-service:~# /usr/local/software/hadoop-0.20.2/bin/hadoop fs -lsr /11/01/13 00:02:32 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
Bad connection to FS. command aborted.

Praveen

________________________________________
From: ext Andrei Savu [savu.andrei@gmail.com]
Sent: Tuesday, January 11, 2011 2:19 PM
To: whirr-user@incubator.apache.org
Subject: Re: Dynamic creation and destroying hadoop on Rackspace

On Tue, Jan 11, 2011 at 7:24 PM,  <pr...@nokia.com> wrote:
> Another thing I noticed from whirr.log is below. Looks like its trying to change ownership to hadoop user but hadoop user doesn't exist on hadoop master. Am I missing anything?

No. I have been able to replicate this using the same version you are
using. I believe that you have found a bug in Whirr 0.2.0.

I suggest that you should use the trunk version, it's stable and it
works fine using the same properties file - I have tested it on
rackspacecloud.

Whirr 0.3.0 should be ready for release in a few weeks.

>
> 2011-01-11 16:28:11,979 DEBUG [jclouds.compute] (user thread 3) << stderr from runscript as root@xx.xx.xx.xx
> + DFS_DATA_DIR=/data/hadoop/hdfs/data
> + MAPRED_LOCAL_DIR=/data/hadoop/mapred/local
> + MAX_MAP_TASKS=2
> + MAX_REDUCE_TASKS=1
> + CHILD_OPTS=-Xmx550m
> + CHILD_ULIMIT=1126400
> + TMP_DIR='/data/tmp/hadoop-${user.name}'
> + mkdir -p /data/hadoop
> + chown hadoop:hadoop /data/hadoop
> chown: invalid user: `hadoop:hadoop'
>
> Praveen
>
> -----Original Message-----
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Tuesday, January 11, 2011 11:27 AM
> To: whirr-user@incubator.apache.org; tom@cloudera.com
> Cc: hammer@cloudera.com
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> whirr.hadoop-install-runurl=cloudera/cdh/install
> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>
> Does these two properties install all cloudera packages or just the hadoop from cloudera dist? I am thinking something could be wrong here...
>
> Praveen
>
> -----Original Message-----
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Monday, January 10, 2011 7:22 PM
> To: tom@cloudera.com
> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> Here are the properties. Please note that I tried w/o specifying whirr.hadoop.install.runurl also and got the same problem.
>
> whirr.service-name=hadoop
> whirr.cluster-name=relevancycluster
> whirr.instance-templates=1 jt+nn,1 dn+tt whirr.provider=cloudservers whirr.identity=<rackspace-id> whirr.credential=<rckspace-api-password>
> #whirr.private-key-file=/home/hadoop/.ssh/id_rsa
> #whirr.public-key-file=/home/hadoop/.ssh/id_rsa.pub
>
> # Uncomment out these lines to run CDH
> whirr.hadoop-install-runurl=cloudera/cdh/install
> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>
> # The size of the instance to use. See http://www.rackspacecloud.com/cloud_hosting_products/serv$
> # id 3: 1GB, 1 virtual core
> # id 4: 2GB, 2 virtual cores
> # id 5: 4GB, 2 virtual cores
> # id 6: 8GB, 4 virtual cores
> # id 7: 15.5GB, 4 virtual cores
> whirr.hardware-id=4
> # Ubuntu 10.04 LTS Lucid
> whirr.image-id=49
>
> ________________________________________
> From: ext Tom White [tom@cloudera.com]
> Sent: Monday, January 10, 2011 7:03 PM
> To: Peddi Praveen (Nokia-MS/Boston)
> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>
> Can you post your Whirr properties file please (with credentials removed).
>
> Thanks
> Tom
>
> On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
>> I am using the latest Whirr. For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>>
>> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>>
>>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>>
>>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>>
>>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>>
>>>> --------------------------------
>>>> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>>>> starting namenode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-10
>>>> 6-96-62.static.cloud-ips.com.out May not run daemons as root. Please
>>>> specify HADOOP_NAMENODE_USER
>>>
>>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>>
>>> Tom
>>>
>>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>>> Are you sure you want to continue connecting (yes/no)? yes
>>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>>> localhost: starting datanode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-10
>>>> 6-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_DATANODE_USER
>>>> localhost: starting secondarynamenode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynameno
>>>> de-184-106-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_SECONDARYNAMENODE_USER starting jobtracker, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-
>>>> 106-96-62.static.cloud-ips.com.out
>>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>>> localhost: starting tasktracker, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184
>>>> -106-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_TASKTRACKER_USER
>>>> --------------------------------
>>>>
>>>> Praveen
>>>> -----Original Message-----
>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>> Sent: Monday, January 10, 2011 5:08 PM
>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>
>>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>>
>>>> Tom
>>>>
>>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>>> Hi Tom,
>>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>>
>>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs
>>>>> -lsr / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>>
>>>>> I should say Whirr is cool so far!
>>>>>
>>>>> Thanks again
>>>>> Praveen
>>>>>
>>>>> -----Original Message-----
>>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>>
>>>>> Hi Praveen,
>>>>>
>>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>>
>>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hado
>>>>> op/
>>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopSer
>>>>> vic
>>>>> eController.java
>>>>>
>>>>> Finally, check out the recipes for advice on setting configuration
>>>>> for
>>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>>
>>>>> Thanks,
>>>>> Tom
>>>>>
>>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>>> Hello all,
>>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>>>> laborious process (especially given that we have about 10
>>>>>> environments that we need to configure). So my question is about
>>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>>> Here is my current deployment.
>>>>>>
>>>>>> Glassfish (Node 1)
>>>>>> Mysql (Node 2)
>>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>>
>>>>>> We can install Glassfish and MySql manually but we would like to
>>>>>> dynamically create/install hadoop cluster, start the servers, run
>>>>>> jobs and then destroy cluster on the cloud. Primary purpose of
>>>>>> doing this is to make deployment easy and save costs. Since the
>>>>>> jobs are run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>>
>>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and
>>>>>> he was positive that I can do the above steps using Whirr. Has
>>>>>> anyone done this using Whirr on Rackspace. I could not find any
>>>>>> examples on how to dynamically install Hadoop cluster on
>>>>>> Rackspace. Any information on this task would be greatly appreciated.
>>>>>>
>>>>>> Thanks
>>>>>> Praveen
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>



--
Andrei Savu -- andreisavu.ro

Re: Dynamic creation and destroying hadoop on Rackspace

Posted by Andrei Savu <sa...@gmail.com>.

On Tue, Jan 11, 2011 at 7:24 PM,  <pr...@nokia.com> wrote:
> Another thing I noticed from whirr.log is below. Looks like its trying to change ownership to hadoop user but hadoop user doesn't exist on hadoop master. Am I missing anything?

No. I have been able to replicate this using the same version you are
using. I believe that you have found a bug in Whirr 0.2.0.

I suggest that you should use the trunk version, it's stable and it
works fine using the same properties file - I have tested it on
rackspacecloud.

Whirr 0.3.0 should be ready for release in a few weeks.

>
> 2011-01-11 16:28:11,979 DEBUG [jclouds.compute] (user thread 3) << stderr from runscript as root@xx.xx.xx.xx
> + DFS_DATA_DIR=/data/hadoop/hdfs/data
> + MAPRED_LOCAL_DIR=/data/hadoop/mapred/local
> + MAX_MAP_TASKS=2
> + MAX_REDUCE_TASKS=1
> + CHILD_OPTS=-Xmx550m
> + CHILD_ULIMIT=1126400
> + TMP_DIR='/data/tmp/hadoop-${user.name}'
> + mkdir -p /data/hadoop
> + chown hadoop:hadoop /data/hadoop
> chown: invalid user: `hadoop:hadoop'
>
> Praveen
>
> -----Original Message-----
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Tuesday, January 11, 2011 11:27 AM
> To: whirr-user@incubator.apache.org; tom@cloudera.com
> Cc: hammer@cloudera.com
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> whirr.hadoop-install-runurl=cloudera/cdh/install
> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>
> Does these two properties install all cloudera packages or just the hadoop from cloudera dist? I am thinking something could be wrong here...
>
> Praveen
>
> -----Original Message-----
> From: Peddi Praveen (Nokia-MS/Boston)
> Sent: Monday, January 10, 2011 7:22 PM
> To: tom@cloudera.com
> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
> Subject: RE: Dynamic creation and destroying hadoop on Rackspace
>
> Here are the properties. Please note that I tried w/o specifying whirr.hadoop.install.runurl also and got the same problem.
>
> whirr.service-name=hadoop
> whirr.cluster-name=relevancycluster
> whirr.instance-templates=1 jt+nn,1 dn+tt whirr.provider=cloudservers whirr.identity=<rackspace-id> whirr.credential=<rckspace-api-password>
> #whirr.private-key-file=/home/hadoop/.ssh/id_rsa
> #whirr.public-key-file=/home/hadoop/.ssh/id_rsa.pub
>
> # Uncomment out these lines to run CDH
> whirr.hadoop-install-runurl=cloudera/cdh/install
> whirr.hadoop-configure-runurl=cloudera/cdh/post-configure
>
> # The size of the instance to use. See http://www.rackspacecloud.com/cloud_hosting_products/serv$
> # id 3: 1GB, 1 virtual core
> # id 4: 2GB, 2 virtual cores
> # id 5: 4GB, 2 virtual cores
> # id 6: 8GB, 4 virtual cores
> # id 7: 15.5GB, 4 virtual cores
> whirr.hardware-id=4
> # Ubuntu 10.04 LTS Lucid
> whirr.image-id=49
>
> ________________________________________
> From: ext Tom White [tom@cloudera.com]
> Sent: Monday, January 10, 2011 7:03 PM
> To: Peddi Praveen (Nokia-MS/Boston)
> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>
> Can you post your Whirr properties file please (with credentials removed).
>
> Thanks
> Tom
>
> On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
>> I am using the latest Whirr. For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>>
>> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>>
>>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>>
>>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>>
>>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>>
>>>> --------------------------------
>>>> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>>>> starting namenode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-10
>>>> 6-96-62.static.cloud-ips.com.out May not run daemons as root. Please
>>>> specify HADOOP_NAMENODE_USER
>>>
>>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>>
>>> Tom
>>>
>>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>>> Are you sure you want to continue connecting (yes/no)? yes
>>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>>> localhost: starting datanode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-10
>>>> 6-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_DATANODE_USER
>>>> localhost: starting secondarynamenode, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynameno
>>>> de-184-106-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_SECONDARYNAMENODE_USER starting jobtracker, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-
>>>> 106-96-62.static.cloud-ips.com.out
>>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>>> localhost: starting tasktracker, logging to
>>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184
>>>> -106-96-62.static.cloud-ips.com.out
>>>> localhost: May not run daemons as root. Please specify
>>>> HADOOP_TASKTRACKER_USER
>>>> --------------------------------
>>>>
>>>> Praveen
>>>> -----Original Message-----
>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>> Sent: Monday, January 10, 2011 5:08 PM
>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>
>>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>>
>>>> Tom
>>>>
>>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>>> Hi Tom,
>>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>>
>>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs
>>>>> -lsr / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>>
>>>>> I should say Whirr is cool so far!
>>>>>
>>>>> Thanks again
>>>>> Praveen
>>>>>
>>>>> -----Original Message-----
>>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>>
>>>>> Hi Praveen,
>>>>>
>>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>>
>>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hado
>>>>> op/
>>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopSer
>>>>> vic
>>>>> eController.java
>>>>>
>>>>> Finally, check out the recipes for advice on setting configuration
>>>>> for
>>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>>
>>>>> Thanks,
>>>>> Tom
>>>>>
>>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>>> Hello all,
>>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>>>> laborious process (especially given that we have about 10
>>>>>> environments that we need to configure). So my question is about
>>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>>> Here is my current deployment.
>>>>>>
>>>>>> Glassfish (Node 1)
>>>>>> Mysql (Node 2)
>>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>>
>>>>>> We can install Glassfish and MySql manually but we would like to
>>>>>> dynamically create/install hadoop cluster, start the servers, run
>>>>>> jobs and then destroy cluster on the cloud. Primary purpose of
>>>>>> doing this is to make deployment easy and save costs. Since the
>>>>>> jobs are run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>>
>>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and
>>>>>> he was positive that I can do the above steps using Whirr. Has
>>>>>> anyone done this using Whirr on Rackspace. I could not find any
>>>>>> examples on how to dynamically install Hadoop cluster on
>>>>>> Rackspace. Any information on this task would be greatly appreciated.
>>>>>>
>>>>>> Thanks
>>>>>> Praveen
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>
>



-- 
Andrei Savu -- andreisavu.ro

RE: Dynamic creation and destroying hadoop on Rackspace

Posted by pr...@nokia.com.

Another thing I noticed from whirr.log is below. Looks like its trying to change ownership to hadoop user but hadoop user doesn't exist on hadoop master. Am I missing anything?

2011-01-11 16:28:11,979 DEBUG [jclouds.compute] (user thread 3) << stderr from runscript as root@xx.xx.xx.xx
+ DFS_DATA_DIR=/data/hadoop/hdfs/data
+ MAPRED_LOCAL_DIR=/data/hadoop/mapred/local
+ MAX_MAP_TASKS=2
+ MAX_REDUCE_TASKS=1
+ CHILD_OPTS=-Xmx550m
+ CHILD_ULIMIT=1126400
+ TMP_DIR='/data/tmp/hadoop-${user.name}'
+ mkdir -p /data/hadoop
+ chown hadoop:hadoop /data/hadoop
chown: invalid user: `hadoop:hadoop' 

Praveen

-----Original Message-----
From: Peddi Praveen (Nokia-MS/Boston) 
Sent: Tuesday, January 11, 2011 11:27 AM
To: whirr-user@incubator.apache.org; tom@cloudera.com
Cc: hammer@cloudera.com
Subject: RE: Dynamic creation and destroying hadoop on Rackspace

whirr.hadoop-install-runurl=cloudera/cdh/install
whirr.hadoop-configure-runurl=cloudera/cdh/post-configure 

Does these two properties install all cloudera packages or just the hadoop from cloudera dist? I am thinking something could be wrong here...

Praveen

-----Original Message-----
From: Peddi Praveen (Nokia-MS/Boston)
Sent: Monday, January 10, 2011 7:22 PM
To: tom@cloudera.com
Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
Subject: RE: Dynamic creation and destroying hadoop on Rackspace

Here are the properties. Please note that I tried w/o specifying whirr.hadoop.install.runurl also and got the same problem.

whirr.service-name=hadoop
whirr.cluster-name=relevancycluster
whirr.instance-templates=1 jt+nn,1 dn+tt whirr.provider=cloudservers whirr.identity=<rackspace-id> whirr.credential=<rckspace-api-password>
#whirr.private-key-file=/home/hadoop/.ssh/id_rsa
#whirr.public-key-file=/home/hadoop/.ssh/id_rsa.pub

# Uncomment out these lines to run CDH
whirr.hadoop-install-runurl=cloudera/cdh/install
whirr.hadoop-configure-runurl=cloudera/cdh/post-configure

# The size of the instance to use. See http://www.rackspacecloud.com/cloud_hosting_products/serv$
# id 3: 1GB, 1 virtual core
# id 4: 2GB, 2 virtual cores
# id 5: 4GB, 2 virtual cores
# id 6: 8GB, 4 virtual cores
# id 7: 15.5GB, 4 virtual cores
whirr.hardware-id=4
# Ubuntu 10.04 LTS Lucid
whirr.image-id=49

________________________________________
From: ext Tom White [tom@cloudera.com]
Sent: Monday, January 10, 2011 7:03 PM
To: Peddi Praveen (Nokia-MS/Boston)
Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
Subject: Re: Dynamic creation and destroying hadoop on Rackspace

Can you post your Whirr properties file please (with credentials removed).

Thanks
Tom

On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
> I am using the latest Whirr. For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>
> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>
>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>
>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>
>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>
>>> --------------------------------
>>> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>>> starting namenode, logging to
>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-10
>>> 6-96-62.static.cloud-ips.com.out May not run daemons as root. Please 
>>> specify HADOOP_NAMENODE_USER
>>
>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>
>> Tom
>>
>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>> Are you sure you want to continue connecting (yes/no)? yes
>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>> localhost: starting datanode, logging to 
>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-10
>>> 6-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify 
>>> HADOOP_DATANODE_USER
>>> localhost: starting secondarynamenode, logging to 
>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynameno
>>> de-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify 
>>> HADOOP_SECONDARYNAMENODE_USER starting jobtracker, logging to
>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-
>>> 106-96-62.static.cloud-ips.com.out
>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>> localhost: starting tasktracker, logging to
>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184
>>> -106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify 
>>> HADOOP_TASKTRACKER_USER
>>> --------------------------------
>>>
>>> Praveen
>>> -----Original Message-----
>>> From: ext Tom White [mailto:tom@cloudera.com]
>>> Sent: Monday, January 10, 2011 5:08 PM
>>> To: Peddi Praveen (Nokia-MS/Boston)
>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>
>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>
>>> Tom
>>>
>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>> Hi Tom,
>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>
>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs 
>>>> -lsr / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>
>>>> I should say Whirr is cool so far!
>>>>
>>>> Thanks again
>>>> Praveen
>>>>
>>>> -----Original Message-----
>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>
>>>> Hi Praveen,
>>>>
>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>
>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hado
>>>> op/
>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopSer
>>>> vic
>>>> eController.java
>>>>
>>>> Finally, check out the recipes for advice on setting configuration 
>>>> for
>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>
>>>> Thanks,
>>>> Tom
>>>>
>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>> Hello all,
>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have 
>>>>> manually installed and configrued Hadoop on Rackspace which is a 
>>>>> laborious process (especially given that we have about 10 
>>>>> environments that we need to configure). So my question is about 
>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>> Here is my current deployment.
>>>>>
>>>>> Glassfish (Node 1)
>>>>> Mysql (Node 2)
>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>
>>>>> We can install Glassfish and MySql manually but we would like to 
>>>>> dynamically create/install hadoop cluster, start the servers, run 
>>>>> jobs and then destroy cluster on the cloud. Primary purpose of 
>>>>> doing this is to make deployment easy and save costs. Since the 
>>>>> jobs are run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>
>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and 
>>>>> he was positive that I can do the above steps using Whirr. Has 
>>>>> anyone done this using Whirr on Rackspace. I could not find any 
>>>>> examples on how to dynamically install Hadoop cluster on 
>>>>> Rackspace. Any information on this task would be greatly appreciated.
>>>>>
>>>>> Thanks
>>>>> Praveen
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

RE: Dynamic creation and destroying hadoop on Rackspace

Posted by pr...@nokia.com.

whirr.hadoop-install-runurl=cloudera/cdh/install
whirr.hadoop-configure-runurl=cloudera/cdh/post-configure 

Does these two properties install all cloudera packages or just the hadoop from cloudera dist? I am thinking something could be wrong here...

Praveen

-----Original Message-----
From: Peddi Praveen (Nokia-MS/Boston) 
Sent: Monday, January 10, 2011 7:22 PM
To: tom@cloudera.com
Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
Subject: RE: Dynamic creation and destroying hadoop on Rackspace

Here are the properties. Please note that I tried w/o specifying whirr.hadoop.install.runurl also and got the same problem.

whirr.service-name=hadoop
whirr.cluster-name=relevancycluster
whirr.instance-templates=1 jt+nn,1 dn+tt 
whirr.provider=cloudservers 
whirr.identity=<rackspace-id> 
whirr.credential=<rckspace-api-password>
#whirr.private-key-file=/home/hadoop/.ssh/id_rsa
#whirr.public-key-file=/home/hadoop/.ssh/id_rsa.pub

# Uncomment out these lines to run CDH
whirr.hadoop-install-runurl=cloudera/cdh/install
whirr.hadoop-configure-runurl=cloudera/cdh/post-configure

# The size of the instance to use. See http://www.rackspacecloud.com/cloud_hosting_products/serv$
# id 3: 1GB, 1 virtual core
# id 4: 2GB, 2 virtual cores
# id 5: 4GB, 2 virtual cores
# id 6: 8GB, 4 virtual cores
# id 7: 15.5GB, 4 virtual cores
whirr.hardware-id=4
# Ubuntu 10.04 LTS Lucid
whirr.image-id=49

________________________________________
From: ext Tom White [tom@cloudera.com]
Sent: Monday, January 10, 2011 7:03 PM
To: Peddi Praveen (Nokia-MS/Boston)
Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
Subject: Re: Dynamic creation and destroying hadoop on Rackspace

Can you post your Whirr properties file please (with credentials removed).

Thanks
Tom

On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
> I am using the latest Whirr. For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>
> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>
>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>
>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>
>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>
>>> --------------------------------
>>> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>>> starting namenode, logging to 
>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-10
>>> 6-96-62.static.cloud-ips.com.out May not run daemons as root. Please 
>>> specify HADOOP_NAMENODE_USER
>>
>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>
>> Tom
>>
>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>> Are you sure you want to continue connecting (yes/no)? yes
>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>> localhost: starting datanode, logging to 
>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-10
>>> 6-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify 
>>> HADOOP_DATANODE_USER
>>> localhost: starting secondarynamenode, logging to 
>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynameno
>>> de-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify 
>>> HADOOP_SECONDARYNAMENODE_USER starting jobtracker, logging to 
>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-
>>> 106-96-62.static.cloud-ips.com.out
>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>> localhost: starting tasktracker, logging to 
>>> /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184
>>> -106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify 
>>> HADOOP_TASKTRACKER_USER
>>> --------------------------------
>>>
>>> Praveen
>>> -----Original Message-----
>>> From: ext Tom White [mailto:tom@cloudera.com]
>>> Sent: Monday, January 10, 2011 5:08 PM
>>> To: Peddi Praveen (Nokia-MS/Boston)
>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>
>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>
>>> Tom
>>>
>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>> Hi Tom,
>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>
>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs 
>>>> -lsr / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>
>>>> I should say Whirr is cool so far!
>>>>
>>>> Thanks again
>>>> Praveen
>>>>
>>>> -----Original Message-----
>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>
>>>> Hi Praveen,
>>>>
>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>
>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hado
>>>> op/ 
>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopSer
>>>> vic
>>>> eController.java
>>>>
>>>> Finally, check out the recipes for advice on setting configuration 
>>>> for
>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>
>>>> Thanks,
>>>> Tom
>>>>
>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>> Hello all,
>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have 
>>>>> manually installed and configrued Hadoop on Rackspace which is a 
>>>>> laborious process (especially given that we have about 10 
>>>>> environments that we need to configure). So my question is about 
>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>> Here is my current deployment.
>>>>>
>>>>> Glassfish (Node 1)
>>>>> Mysql (Node 2)
>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>
>>>>> We can install Glassfish and MySql manually but we would like to 
>>>>> dynamically create/install hadoop cluster, start the servers, run 
>>>>> jobs and then destroy cluster on the cloud. Primary purpose of 
>>>>> doing this is to make deployment easy and save costs. Since the 
>>>>> jobs are run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>
>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and 
>>>>> he was positive that I can do the above steps using Whirr. Has 
>>>>> anyone done this using Whirr on Rackspace. I could not find any 
>>>>> examples on how to dynamically install Hadoop cluster on 
>>>>> Rackspace. Any information on this task would be greatly appreciated.
>>>>>
>>>>> Thanks
>>>>> Praveen
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

RE: Dynamic creation and destroying hadoop on Rackspace

Posted by pr...@nokia.com.

Here are the properties. Please note that I tried w/o specifying whirr.hadoop.install.runurl also and got the same problem.

whirr.service-name=hadoop
whirr.cluster-name=relevancycluster
whirr.instance-templates=1 jt+nn,1 dn+tt
whirr.provider=cloudservers
whirr.identity=<rackspace-id>
whirr.credential=<rckspace-api-password>
#whirr.private-key-file=/home/hadoop/.ssh/id_rsa
#whirr.public-key-file=/home/hadoop/.ssh/id_rsa.pub

# Uncomment out these lines to run CDH
whirr.hadoop-install-runurl=cloudera/cdh/install
whirr.hadoop-configure-runurl=cloudera/cdh/post-configure

# The size of the instance to use. See http://www.rackspacecloud.com/cloud_hosting_products/serv$
# id 3: 1GB, 1 virtual core
# id 4: 2GB, 2 virtual cores
# id 5: 4GB, 2 virtual cores
# id 6: 8GB, 4 virtual cores
# id 7: 15.5GB, 4 virtual cores
whirr.hardware-id=4
# Ubuntu 10.04 LTS Lucid
whirr.image-id=49

________________________________________
From: ext Tom White [tom@cloudera.com]
Sent: Monday, January 10, 2011 7:03 PM
To: Peddi Praveen (Nokia-MS/Boston)
Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
Subject: Re: Dynamic creation and destroying hadoop on Rackspace

Can you post your Whirr properties file please (with credentials removed).

Thanks
Tom

On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
> I am using the latest Whirr. For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>
> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>
>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>
>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>
>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>
>>> --------------------------------
>>> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>>> starting namenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-106-96-62.static.cloud-ips.com.out
>>> May not run daemons as root. Please specify HADOOP_NAMENODE_USER
>>
>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>
>> Tom
>>
>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>> Are you sure you want to continue connecting (yes/no)? yes
>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>> localhost: starting datanode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify HADOOP_DATANODE_USER
>>> localhost: starting secondarynamenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynamenode-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify HADOOP_SECONDARYNAMENODE_USER
>>> starting jobtracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-106-96-62.static.cloud-ips.com.out
>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>> localhost: starting tasktracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify HADOOP_TASKTRACKER_USER
>>> --------------------------------
>>>
>>> Praveen
>>> -----Original Message-----
>>> From: ext Tom White [mailto:tom@cloudera.com]
>>> Sent: Monday, January 10, 2011 5:08 PM
>>> To: Peddi Praveen (Nokia-MS/Boston)
>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>
>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>
>>> Tom
>>>
>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>> Hi Tom,
>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>
>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs -lsr
>>>> / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>
>>>> I should say Whirr is cool so far!
>>>>
>>>> Thanks again
>>>> Praveen
>>>>
>>>> -----Original Message-----
>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>
>>>> Hi Praveen,
>>>>
>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>
>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hadoop/
>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopServic
>>>> eController.java
>>>>
>>>> Finally, check out the recipes for advice on setting configuration for
>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>
>>>> Thanks,
>>>> Tom
>>>>
>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>> Hello all,
>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>>> laborious process (especially given that we have about 10
>>>>> environments that we need to configure). So my question is about
>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>> Here is my current deployment.
>>>>>
>>>>> Glassfish (Node 1)
>>>>> Mysql (Node 2)
>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>
>>>>> We can install Glassfish and MySql manually but we would like to
>>>>> dynamically create/install hadoop cluster, start the servers, run
>>>>> jobs and then destroy cluster on the cloud. Primary purpose of doing
>>>>> this is to make deployment easy and save costs. Since the jobs are
>>>>> run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>
>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and he
>>>>> was positive that I can do the above steps using Whirr. Has anyone
>>>>> done this using Whirr on Rackspace. I could not find any examples on
>>>>> how to dynamically install Hadoop cluster on Rackspace. Any
>>>>> information on this task would be greatly appreciated.
>>>>>
>>>>> Thanks
>>>>> Praveen
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Dynamic creation and destroying hadoop on Rackspace

Posted by Tom White <to...@cloudera.com>.

Can you post your Whirr properties file please (with credentials removed).

Thanks
Tom

On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
> I am using the latest Whirr. For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>
> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>
>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>
>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>
>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>
>>> --------------------------------
>>> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>>> starting namenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-106-96-62.static.cloud-ips.com.out
>>> May not run daemons as root. Please specify HADOOP_NAMENODE_USER
>>
>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>
>> Tom
>>
>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>> Are you sure you want to continue connecting (yes/no)? yes
>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>> localhost: starting datanode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify HADOOP_DATANODE_USER
>>> localhost: starting secondarynamenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynamenode-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify HADOOP_SECONDARYNAMENODE_USER
>>> starting jobtracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-106-96-62.static.cloud-ips.com.out
>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>> localhost: starting tasktracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify HADOOP_TASKTRACKER_USER
>>> --------------------------------
>>>
>>> Praveen
>>> -----Original Message-----
>>> From: ext Tom White [mailto:tom@cloudera.com]
>>> Sent: Monday, January 10, 2011 5:08 PM
>>> To: Peddi Praveen (Nokia-MS/Boston)
>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>
>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>
>>> Tom
>>>
>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>> Hi Tom,
>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>
>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs -lsr
>>>> / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>
>>>> I should say Whirr is cool so far!
>>>>
>>>> Thanks again
>>>> Praveen
>>>>
>>>> -----Original Message-----
>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>
>>>> Hi Praveen,
>>>>
>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>
>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hadoop/
>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopServic
>>>> eController.java
>>>>
>>>> Finally, check out the recipes for advice on setting configuration for
>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>
>>>> Thanks,
>>>> Tom
>>>>
>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>> Hello all,
>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>>> laborious process (especially given that we have about 10
>>>>> environments that we need to configure). So my question is about
>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>> Here is my current deployment.
>>>>>
>>>>> Glassfish (Node 1)
>>>>> Mysql (Node 2)
>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>
>>>>> We can install Glassfish and MySql manually but we would like to
>>>>> dynamically create/install hadoop cluster, start the servers, run
>>>>> jobs and then destroy cluster on the cloud. Primary purpose of doing
>>>>> this is to make deployment easy and save costs. Since the jobs are
>>>>> run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>
>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and he
>>>>> was positive that I can do the above steps using Whirr. Has anyone
>>>>> done this using Whirr on Rackspace. I could not find any examples on
>>>>> how to dynamically install Hadoop cluster on Rackspace. Any
>>>>> information on this task would be greatly appreciated.
>>>>>
>>>>> Thanks
>>>>> Praveen
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

RE: Dynamic creation and destroying hadoop on Rackspace

Posted by pr...@nokia.com.

Yes it is 0.2.0 from Whirr's website, not from cloudera's website.

http://mirror.cc.columbia.edu/pub/software/apache//incubator/whirr/stable/whirr-0.2.0-incubating.tar.gz

Praveen
________________________________________
From: ext Tom White [tom@cloudera.com]
Sent: Monday, January 10, 2011 7:30 PM
To: Peddi Praveen (Nokia-MS/Boston)
Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
Subject: Re: Dynamic creation and destroying hadoop on Rackspace

On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
> I am using the latest Whirr.

Which version specifically? 0.2.0, the one in CDH3b3, or trunk?

Thanks,
Tom

> For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>
> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>
>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>
>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>
>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>
>>> --------------------------------
>>> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>>> starting namenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-106-96-62.static.cloud-ips.com.out
>>> May not run daemons as root. Please specify HADOOP_NAMENODE_USER
>>
>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>
>> Tom
>>
>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>> Are you sure you want to continue connecting (yes/no)? yes
>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>> localhost: starting datanode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify HADOOP_DATANODE_USER
>>> localhost: starting secondarynamenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynamenode-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify HADOOP_SECONDARYNAMENODE_USER
>>> starting jobtracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-106-96-62.static.cloud-ips.com.out
>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>> localhost: starting tasktracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify HADOOP_TASKTRACKER_USER
>>> --------------------------------
>>>
>>> Praveen
>>> -----Original Message-----
>>> From: ext Tom White [mailto:tom@cloudera.com]
>>> Sent: Monday, January 10, 2011 5:08 PM
>>> To: Peddi Praveen (Nokia-MS/Boston)
>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>
>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>
>>> Tom
>>>
>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>> Hi Tom,
>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>
>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs -lsr
>>>> / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>
>>>> I should say Whirr is cool so far!
>>>>
>>>> Thanks again
>>>> Praveen
>>>>
>>>> -----Original Message-----
>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>
>>>> Hi Praveen,
>>>>
>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>
>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hadoop/
>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopServic
>>>> eController.java
>>>>
>>>> Finally, check out the recipes for advice on setting configuration for
>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>
>>>> Thanks,
>>>> Tom
>>>>
>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>> Hello all,
>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>>> laborious process (especially given that we have about 10
>>>>> environments that we need to configure). So my question is about
>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>> Here is my current deployment.
>>>>>
>>>>> Glassfish (Node 1)
>>>>> Mysql (Node 2)
>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>
>>>>> We can install Glassfish and MySql manually but we would like to
>>>>> dynamically create/install hadoop cluster, start the servers, run
>>>>> jobs and then destroy cluster on the cloud. Primary purpose of doing
>>>>> this is to make deployment easy and save costs. Since the jobs are
>>>>> run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>
>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and he
>>>>> was positive that I can do the above steps using Whirr. Has anyone
>>>>> done this using Whirr on Rackspace. I could not find any examples on
>>>>> how to dynamically install Hadoop cluster on Rackspace. Any
>>>>> information on this task would be greatly appreciated.
>>>>>
>>>>> Thanks
>>>>> Praveen
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Dynamic creation and destroying hadoop on Rackspace

Posted by Tom White <to...@cloudera.com>.

On Mon, Jan 10, 2011 at 3:59 PM,  <pr...@nokia.com> wrote:
> I am using the latest Whirr.

Which version specifically? 0.2.0, the one in CDH3b3, or trunk?

Thanks,
Tom

> For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.
>
> On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:
>
>> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>>>
>>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>>>
>>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>>>
>>> --------------------------------
>>> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>>> starting namenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-106-96-62.static.cloud-ips.com.out
>>> May not run daemons as root. Please specify HADOOP_NAMENODE_USER
>>
>> That's the problem. Which version of Whirr, Hadoop, OS are you using?
>>
>> Tom
>>
>>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>>> Are you sure you want to continue connecting (yes/no)? yes
>>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>>> localhost: starting datanode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify HADOOP_DATANODE_USER
>>> localhost: starting secondarynamenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynamenode-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify HADOOP_SECONDARYNAMENODE_USER
>>> starting jobtracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-106-96-62.static.cloud-ips.com.out
>>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>>> localhost: starting tasktracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184-106-96-62.static.cloud-ips.com.out
>>> localhost: May not run daemons as root. Please specify HADOOP_TASKTRACKER_USER
>>> --------------------------------
>>>
>>> Praveen
>>> -----Original Message-----
>>> From: ext Tom White [mailto:tom@cloudera.com]
>>> Sent: Monday, January 10, 2011 5:08 PM
>>> To: Peddi Praveen (Nokia-MS/Boston)
>>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>
>>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>>>
>>> Tom
>>>
>>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>>> Hi Tom,
>>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>>>
>>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs -lsr
>>>> / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>>>
>>>> I should say Whirr is cool so far!
>>>>
>>>> Thanks again
>>>> Praveen
>>>>
>>>> -----Original Message-----
>>>> From: ext Tom White [mailto:tom@cloudera.com]
>>>> Sent: Monday, January 10, 2011 2:23 PM
>>>> To: Peddi Praveen (Nokia-MS/Boston)
>>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>>>
>>>> Hi Praveen,
>>>>
>>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>>>
>>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hadoop/
>>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopServic
>>>> eController.java
>>>>
>>>> Finally, check out the recipes for advice on setting configuration for
>>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>>>
>>>> Thanks,
>>>> Tom
>>>>
>>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>>> Hello all,
>>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>>> laborious process (especially given that we have about 10
>>>>> environments that we need to configure). So my question is about
>>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>>> Here is my current deployment.
>>>>>
>>>>> Glassfish (Node 1)
>>>>> Mysql (Node 2)
>>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>>>
>>>>> We can install Glassfish and MySql manually but we would like to
>>>>> dynamically create/install hadoop cluster, start the servers, run
>>>>> jobs and then destroy cluster on the cloud. Primary purpose of doing
>>>>> this is to make deployment easy and save costs. Since the jobs are
>>>>> run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>>>
>>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and he
>>>>> was positive that I can do the above steps using Whirr. Has anyone
>>>>> done this using Whirr on Rackspace. I could not find any examples on
>>>>> how to dynamically install Hadoop cluster on Rackspace. Any
>>>>> information on this task would be greatly appreciated.
>>>>>
>>>>> Thanks
>>>>> Praveen
>>>>>
>>>>>
>>>>>
>>>>
>>>
>

Re: Dynamic creation and destroying hadoop on Rackspace

Posted by pr...@nokia.com.

I am using the latest Whirr. For hadoop, I actually specified cloudera URL in the properties file but on master hadoop machine I saw references to hadoop-0.20. OS of my client is CentOS but I am going with default OS for hadoop which is Ubuntu 10.04.

On Jan 10, 2011, at 6:38 PM, "ext Tom White" <to...@cloudera.com> wrote:

> On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
>> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>> 
>> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>> 
>> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>> 
>> --------------------------------
>> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
>> starting namenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-106-96-62.static.cloud-ips.com.out
>> May not run daemons as root. Please specify HADOOP_NAMENODE_USER
> 
> That's the problem. Which version of Whirr, Hadoop, OS are you using?
> 
> Tom
> 
>> The authenticity of host 'localhost (127.0.0.1)' can't be established.
>> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
>> Are you sure you want to continue connecting (yes/no)? yes
>> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
>> localhost: starting datanode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-106-96-62.static.cloud-ips.com.out
>> localhost: May not run daemons as root. Please specify HADOOP_DATANODE_USER
>> localhost: starting secondarynamenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynamenode-184-106-96-62.static.cloud-ips.com.out
>> localhost: May not run daemons as root. Please specify HADOOP_SECONDARYNAMENODE_USER
>> starting jobtracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-106-96-62.static.cloud-ips.com.out
>> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
>> localhost: starting tasktracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184-106-96-62.static.cloud-ips.com.out
>> localhost: May not run daemons as root. Please specify HADOOP_TASKTRACKER_USER
>> --------------------------------
>> 
>> Praveen
>> -----Original Message-----
>> From: ext Tom White [mailto:tom@cloudera.com]
>> Sent: Monday, January 10, 2011 5:08 PM
>> To: Peddi Praveen (Nokia-MS/Boston)
>> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>> 
>> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>> 
>> Tom
>> 
>> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>>> Hi Tom,
>>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>> 
>>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs -lsr
>>> / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>> 
>>> I should say Whirr is cool so far!
>>> 
>>> Thanks again
>>> Praveen
>>> 
>>> -----Original Message-----
>>> From: ext Tom White [mailto:tom@cloudera.com]
>>> Sent: Monday, January 10, 2011 2:23 PM
>>> To: Peddi Praveen (Nokia-MS/Boston)
>>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>> 
>>> Hi Praveen,
>>> 
>>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>> 
>>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hadoop/
>>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopServic
>>> eController.java
>>> 
>>> Finally, check out the recipes for advice on setting configuration for
>>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>> 
>>> Thanks,
>>> Tom
>>> 
>>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>>> Hello all,
>>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>>> manually installed and configrued Hadoop on Rackspace which is a
>>>> laborious process (especially given that we have about 10
>>>> environments that we need to configure). So my question is about
>>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>>> Here is my current deployment.
>>>> 
>>>> Glassfish (Node 1)
>>>> Mysql (Node 2)
>>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>> 
>>>> We can install Glassfish and MySql manually but we would like to
>>>> dynamically create/install hadoop cluster, start the servers, run
>>>> jobs and then destroy cluster on the cloud. Primary purpose of doing
>>>> this is to make deployment easy and save costs. Since the jobs are
>>>> run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>> 
>>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and he
>>>> was positive that I can do the above steps using Whirr. Has anyone
>>>> done this using Whirr on Rackspace. I could not find any examples on
>>>> how to dynamically install Hadoop cluster on Rackspace. Any
>>>> information on this task would be greatly appreciated.
>>>> 
>>>> Thanks
>>>> Praveen
>>>> 
>>>> 
>>>> 
>>> 
>>

Re: Dynamic creation and destroying hadoop on Rackspace

Posted by Tom White <to...@cloudera.com>.

On Mon, Jan 10, 2011 at 2:22 PM,  <pr...@nokia.com> wrote:
> Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.
>
> root@hadoop-master:~# netstat -a | grep 50030 returns nothing
>
> Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:
>
> --------------------------------
> root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh
> starting namenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-106-96-62.static.cloud-ips.com.out
> May not run daemons as root. Please specify HADOOP_NAMENODE_USER

That's the problem. Which version of Whirr, Hadoop, OS are you using?

Tom

> The authenticity of host 'localhost (127.0.0.1)' can't be established.
> RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
> Are you sure you want to continue connecting (yes/no)? yes
> localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
> localhost: starting datanode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-106-96-62.static.cloud-ips.com.out
> localhost: May not run daemons as root. Please specify HADOOP_DATANODE_USER
> localhost: starting secondarynamenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynamenode-184-106-96-62.static.cloud-ips.com.out
> localhost: May not run daemons as root. Please specify HADOOP_SECONDARYNAMENODE_USER
> starting jobtracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-106-96-62.static.cloud-ips.com.out
> May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
> localhost: starting tasktracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184-106-96-62.static.cloud-ips.com.out
> localhost: May not run daemons as root. Please specify HADOOP_TASKTRACKER_USER
> --------------------------------
>
> Praveen
> -----Original Message-----
> From: ext Tom White [mailto:tom@cloudera.com]
> Sent: Monday, January 10, 2011 5:08 PM
> To: Peddi Praveen (Nokia-MS/Boston)
> Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>
> Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.
>
> Tom
>
> On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
>> Hi Tom,
>> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>>
>> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs -lsr
>> / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED:
>> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
>> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
>> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
>> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
>> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
>> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
>> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
>> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
>> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
>> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>>
>> I should say Whirr is cool so far!
>>
>> Thanks again
>> Praveen
>>
>> -----Original Message-----
>> From: ext Tom White [mailto:tom@cloudera.com]
>> Sent: Monday, January 10, 2011 2:23 PM
>> To: Peddi Praveen (Nokia-MS/Boston)
>> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
>> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>>
>> Hi Praveen,
>>
>> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>>
>> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hadoop/
>> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopServic
>> eController.java
>>
>> Finally, check out the recipes for advice on setting configuration for
>> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>>
>> Thanks,
>> Tom
>>
>> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>>> Hello all,
>>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>>> manually installed and configrued Hadoop on Rackspace which is a
>>> laborious process (especially given that we have about 10
>>> environments that we need to configure). So my question is about
>>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>>> Here is my current deployment.
>>>
>>> Glassfish (Node 1)
>>> Mysql (Node 2)
>>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>>
>>> We can install Glassfish and MySql manually but we would like to
>>> dynamically create/install hadoop cluster, start the servers, run
>>> jobs and then destroy cluster on the cloud. Primary purpose of doing
>>> this is to make deployment easy and save costs. Since the jobs are
>>> run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>>
>>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and he
>>> was positive that I can do the above steps using Whirr. Has anyone
>>> done this using Whirr on Rackspace. I could not find any examples on
>>> how to dynamically install Hadoop cluster on Rackspace. Any
>>> information on this task would be greatly appreciated.
>>>
>>> Thanks
>>> Praveen
>>>
>>>
>>>
>>
>

RE: Dynamic creation and destroying hadoop on Rackspace

Posted by pr...@nokia.com.

Looks like hadoop was installed but never started on the master node. There were no files under /var/log/hadoop on master node either.

root@hadoop-master:~# netstat -a | grep 50030 returns nothing

Does Whirr install and start Hadoop as "root"? Is that the problem? When I try to start Hadoop manually from hadoop master, I see following:

--------------------------------
root@hadoop-master:~# /etc/alternatives/hadoop-lib/bin/start-all.sh 
starting namenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-namenode-184-106-96-62.static.cloud-ips.com.out
May not run daemons as root. Please specify HADOOP_NAMENODE_USER
The authenticity of host 'localhost (127.0.0.1)' can't be established.
RSA key fingerprint is d4:3c:55:4d:76:62:3d:b2:e1:74:a7:6f:bf:92:ab:3d.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
localhost: starting datanode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-datanode-184-106-96-62.static.cloud-ips.com.out
localhost: May not run daemons as root. Please specify HADOOP_DATANODE_USER
localhost: starting secondarynamenode, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-secondarynamenode-184-106-96-62.static.cloud-ips.com.out
localhost: May not run daemons as root. Please specify HADOOP_SECONDARYNAMENODE_USER
starting jobtracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-jobtracker-184-106-96-62.static.cloud-ips.com.out
May not run daemons as root. Please specify HADOOP_JOBTRACKER_USER
localhost: starting tasktracker, logging to /etc/alternatives/hadoop-lib/bin/../logs/hadoop-root-tasktracker-184-106-96-62.static.cloud-ips.com.out
localhost: May not run daemons as root. Please specify HADOOP_TASKTRACKER_USER
--------------------------------

Praveen
-----Original Message-----
From: ext Tom White [mailto:tom@cloudera.com] 
Sent: Monday, January 10, 2011 5:08 PM
To: Peddi Praveen (Nokia-MS/Boston)
Cc: hammer@cloudera.com; whirr-user@incubator.apache.org
Subject: Re: Dynamic creation and destroying hadoop on Rackspace

Can you connect to the jobtracker UI? It's running on the master, port 50030. You can also ssh into the machine and look at the logs under /var/log/hadoop to see if there are any errors.

Tom

On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
> Hi Tom,
> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>
> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs -lsr 
> / 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED: 
> hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>
> I should say Whirr is cool so far!
>
> Thanks again
> Praveen
>
> -----Original Message-----
> From: ext Tom White [mailto:tom@cloudera.com]
> Sent: Monday, January 10, 2011 2:23 PM
> To: Peddi Praveen (Nokia-MS/Boston)
> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>
> Hi Praveen,
>
> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>
> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hadoop/
> src/test/java/org/apache/whirr/service/hadoop/integration/HadoopServic
> eController.java
>
> Finally, check out the recipes for advice on setting configuration for
> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>
> Thanks,
> Tom
>
> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>> Hello all,
>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>> The jobs run for a total of 3 to 5 hours a day. Currently I have 
>> manually installed and configrued Hadoop on Rackspace which is a 
>> laborious process (especially given that we have about 10 
>> environments that we need to configure). So my question is about 
>> automatic creation and desrtoying of Hadoop cluster using a program (preferably Java).
>> Here is my current deployment.
>>
>> Glassfish (Node 1)
>> Mysql (Node 2)
>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>
>> We can install Glassfish and MySql manually but we would like to 
>> dynamically create/install hadoop cluster, start the servers, run 
>> jobs and then destroy cluster on the cloud. Primary purpose of doing 
>> this is to make deployment easy and save costs. Since the jobs are 
>> run only for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>
>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and he 
>> was positive that I can do the above steps using Whirr. Has anyone 
>> done this using Whirr on Rackspace. I could not find any examples on 
>> how to dynamically install Hadoop cluster on Rackspace. Any 
>> information on this task would be greatly appreciated.
>>
>> Thanks
>> Praveen
>>
>>
>>
>

Re: Dynamic creation and destroying hadoop on Rackspace

Posted by Tom White <to...@cloudera.com>.

Can you connect to the jobtracker UI? It's running on the master, port
50030. You can also ssh into the machine and look at the logs under
/var/log/hadoop to see if there are any errors.

Tom

On Mon, Jan 10, 2011 at 12:33 PM,  <pr...@nokia.com> wrote:
> Hi Tom,
> Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?
>
> [root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs -lsr /
> 11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
> 11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
> 11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
> 11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
> 11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
> 11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
> 11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
> 11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
> 11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
> 11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
> 11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).
>
> I should say Whirr is cool so far!
>
> Thanks again
> Praveen
>
> -----Original Message-----
> From: ext Tom White [mailto:tom@cloudera.com]
> Sent: Monday, January 10, 2011 2:23 PM
> To: Peddi Praveen (Nokia-MS/Boston)
> Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
> Subject: Re: Dynamic creation and destroying hadoop on Rackspace
>
> Hi Praveen,
>
> You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.
>
> http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hadoop/src/test/java/org/apache/whirr/service/hadoop/integration/HadoopServiceController.java
>
> Finally, check out the recipes for advice on setting configuration for
> Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.
>
> Thanks,
> Tom
>
> On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
>> Hello all,
>> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud.
>> The jobs run for a total of 3 to 5 hours a day. Currently I have
>> manually installed and configrued Hadoop on Rackspace which is a
>> laborious process (especially given that we have about 10 environments
>> that we need to configure). So my question is about automatic creation
>> and desrtoying of Hadoop cluster using a program (preferably Java).
>> Here is my current deployment.
>>
>> Glassfish (Node 1)
>> Mysql (Node 2)
>> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>>
>> We can install Glassfish and MySql manually but we would like to
>> dynamically create/install hadoop cluster, start the servers, run jobs
>> and then destroy cluster on the cloud. Primary purpose of doing this
>> is to make deployment easy and save costs. Since the jobs are run only
>> for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>>
>> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and he
>> was positive that I can do the above steps using Whirr. Has anyone
>> done this using Whirr on Rackspace. I could not find any examples on
>> how to dynamically install Hadoop cluster on Rackspace. Any
>> information on this task would be greatly appreciated.
>>
>> Thanks
>> Praveen
>>
>>
>>
>

RE: Dynamic creation and destroying hadoop on Rackspace

Posted by pr...@nokia.com.

Hi Tom,
Thank you very much for your response. We were able to figure out how to launch and destroy the cluster using the command line tool. We haven't tried Java client yet (we will do it soon). But with command line tool, we could not access hadoop fs and any of the hadoop command. We also ran the proxy script. Here is the error I am getting. My client node is not able to talk to hadoo master node. We tried as hadoop user and root but no luck. Do you think we are missing anything?

[root@hadoop-master ~]# /usr/local/software/hadoop/bin/hadoop fs -lsr /
11/01/10 20:29:17 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
11/01/10 20:29:18 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 0 time(s).
11/01/10 20:29:19 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 1 time(s).
11/01/10 20:29:20 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 2 time(s).
11/01/10 20:29:21 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 3 time(s).
11/01/10 20:29:22 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 4 time(s).
11/01/10 20:29:23 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 5 time(s).
11/01/10 20:29:24 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 6 time(s).
11/01/10 20:29:25 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 7 time(s).
11/01/10 20:29:26 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 8 time(s).
11/01/10 20:29:27 INFO ipc.Client: Retrying connect to server: 184-106-158-27.static.cloud-ips.com/184.106.158.27:8020. Already tried 9 time(s).

I should say Whirr is cool so far!

Thanks again
Praveen

-----Original Message-----
From: ext Tom White [mailto:tom@cloudera.com] 
Sent: Monday, January 10, 2011 2:23 PM
To: Peddi Praveen (Nokia-MS/Boston)
Cc: whirr-user@incubator.apache.org; hammer@cloudera.com
Subject: Re: Dynamic creation and destroying hadoop on Rackspace

Hi Praveen,

You should be able to do exactly this using Whirr. There's not a lot of documentation to describe what you want to do, but I recommend you start by having a look at http://incubator.apache.org/whirr/. The Hadoop unit tests will show you how to start and stop a cluster from Java and submit a job. E.g.

http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hadoop/src/test/java/org/apache/whirr/service/hadoop/integration/HadoopServiceController.java

Finally, check out the recipes for advice on setting configuration for
Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.

Thanks,
Tom

On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
> Hello all,
> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud. 
> The jobs run for a total of 3 to 5 hours a day. Currently I have 
> manually installed and configrued Hadoop on Rackspace which is a 
> laborious process (especially given that we have about 10 environments 
> that we need to configure). So my question is about automatic creation 
> and desrtoying of Hadoop cluster using a program (preferably Java). 
> Here is my current deployment.
>
> Glassfish (Node 1)
> Mysql (Node 2)
> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>
> We can install Glassfish and MySql manually but we would like to 
> dynamically create/install hadoop cluster, start the servers, run jobs 
> and then destroy cluster on the cloud. Primary purpose of doing this 
> is to make deployment easy and save costs. Since the jobs are run only 
> for few hours a day we don't want to have Hadoop running on the cloud for the whole day.
>
> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and he 
> was positive that I can do the above steps using Whirr. Has anyone 
> done this using Whirr on Rackspace. I could not find any examples on 
> how to dynamically install Hadoop cluster on Rackspace. Any 
> information on this task would be greatly appreciated.
>
> Thanks
> Praveen
>
>
>

Re: Dynamic creation and destroying hadoop on Rackspace

Posted by Tom White <to...@cloudera.com>.

Hi Praveen,

You should be able to do exactly this using Whirr. There's not a lot
of documentation to describe what you want to do, but I recommend you
start by having a look at http://incubator.apache.org/whirr/. The
Hadoop unit tests will show you how to start and stop a cluster from
Java and submit a job. E.g.

http://svn.apache.org/repos/asf/incubator/whirr/trunk/services/hadoop/src/test/java/org/apache/whirr/service/hadoop/integration/HadoopServiceController.java

Finally, check out the recipes for advice on setting configuration for
Rackspace: http://svn.apache.org/repos/asf/incubator/whirr/trunk/recipes/hadoop-rackspace.properties.

Thanks,
Tom

On Mon, Jan 10, 2011 at 10:27 AM,  <pr...@nokia.com> wrote:
> Hello all,
> W have few Hadoop jobs that we are running on Hadoop RackSpace Cloud. The
> jobs run for a total of 3 to 5 hours a day. Currently I have manually
> installed and configrued Hadoop on Rackspace which is a laborious process
> (especially given that we have about 10 environments that we need to
> configure). So my question is about automatic creation and desrtoying of
> Hadoop cluster using a program (preferably Java). Here is my current
> deployment.
>
> Glassfish (Node 1)
> Mysql (Node 2)
> Hadoop with 1 master and 5 Slaves (Nodes 3 to 8)
>
> We can install Glassfish and MySql manually but we would like to dynamically
> create/install hadoop cluster, start the servers, run jobs and then destroy
> cluster on the cloud. Primary purpose of doing this is to make deployment
> easy and save costs. Since the jobs are run only for few hours a day we
> don't want to have Hadoop running on the cloud for the whole day.
>
> Jeff Hammerbacher from Cloudera had suggested I look at Whirr and he was
> positive that I can do the above steps using Whirr. Has anyone done this
> using Whirr on Rackspace. I could not find any examples on how to
> dynamically install Hadoop cluster on Rackspace. Any information on this
> task would be greatly appreciated.
>
> Thanks
> Praveen
>
>
>