You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Prasan Ary <vo...@yahoo.com> on 2008/03/20 18:15:48 UTC

Hadoop on EC2 for large cluster

Hi All,
  I have been trying to configure Hadoop on EC2 for large number of clusters ( 100 plus). It seems that I have to copy EC2 private key to all the machines in the cluster so that they can have SSH connections.
  For now it seems I have to run a script to copy the key file to each of the EC2 instances. I wanted to know if there is a better way to accomplish this.
   
  Thanks,
  PA

       
---------------------------------
Never miss a thing.   Make Yahoo your homepage.

Re: Hadoop on EC2 for large cluster

Posted by Andreas Kostyrka <an...@kostyrka.org>.
Actually, I personally use the following "2 part" copy technique to copy
files to a cluster of boxes:

tar cf - myfile | dsh -f host-list-file -i -c -M tar xCfv /tmp -

The first tar packages myfile into a tar file.

dsh runs a tar that unpacks the tar (in the above case all boxes listed
in host-list-file would have a /tmp/myfile after the command).

Tar options that are relevant include C (chdir) and v (verbose, can be
given twice) so you see what got copied.

dsh options that are relevant:
-i copy stdin to all ssh processes, requires -c
-c do the ssh calls concurrently.
-M prefix the out from the ssh with the hostname.

While this is not rsync, it has the benefit of being processed
concurrently, and quite flexible.

Andreas

Am Donnerstag, den 20.03.2008, 19:57 +0200 schrieb Andrey Pankov:
> Hi,
> 
> Did you see hadoop-0.16.0/src/contrib/ec2/bin/start-hadoop script? It 
> already contains such part:
> 
> echo "Copying private key to slaves"
> for slave in `cat slaves`; do
>    scp $SSH_OPTS $PRIVATE_KEY_PATH "root@$slave:/root/.ssh/id_rsa"
>    ssh $SSH_OPTS "root@$slave" "chmod 600 /root/.ssh/id_rsa"
>    sleep 1
> done
> 
> Anyway, did you tried hadoop-ec2 script? It works well for task you 
> described.
> 
> 
> Prasan Ary wrote:
> > Hi All,
> >   I have been trying to configure Hadoop on EC2 for large number of clusters ( 100 plus). It seems that I have to copy EC2 private key to all the machines in the cluster so that they can have SSH connections.
> >   For now it seems I have to run a script to copy the key file to each of the EC2 instances. I wanted to know if there is a better way to accomplish this.
> >    
> >   Thanks,
> >   PA
> > 
> >        
> > ---------------------------------
> > Never miss a thing.   Make Yahoo your homepage.
> 
> ---
> Andrey Pankov

Re: Hadoop on EC2 for large cluster

Posted by Andrey Pankov <ap...@iponweb.net>.
Hi,

Did you see hadoop-0.16.0/src/contrib/ec2/bin/start-hadoop script? It 
already contains such part:

echo "Copying private key to slaves"
for slave in `cat slaves`; do
   scp $SSH_OPTS $PRIVATE_KEY_PATH "root@$slave:/root/.ssh/id_rsa"
   ssh $SSH_OPTS "root@$slave" "chmod 600 /root/.ssh/id_rsa"
   sleep 1
done

Anyway, did you tried hadoop-ec2 script? It works well for task you 
described.


Prasan Ary wrote:
> Hi All,
>   I have been trying to configure Hadoop on EC2 for large number of clusters ( 100 plus). It seems that I have to copy EC2 private key to all the machines in the cluster so that they can have SSH connections.
>   For now it seems I have to run a script to copy the key file to each of the EC2 instances. I wanted to know if there is a better way to accomplish this.
>    
>   Thanks,
>   PA
> 
>        
> ---------------------------------
> Never miss a thing.   Make Yahoo your homepage.

---
Andrey Pankov

Re: Hadoop on EC2 for large cluster

Posted by Tom White <to...@gmail.com>.
Yes, this isn't ideal for larger clusters. There's a jira to address
this: https://issues.apache.org/jira/browse/HADOOP-2410.

Tom

On 20/03/2008, Prasan Ary <vo...@yahoo.com> wrote:
> Hi All,
>   I have been trying to configure Hadoop on EC2 for large number of clusters ( 100 plus). It seems that I have to copy EC2 private key to all the machines in the cluster so that they can have SSH connections.
>   For now it seems I have to run a script to copy the key file to each of the EC2 instances. I wanted to know if there is a better way to accomplish this.
>
>   Thanks,
>
>   PA
>
>
>
>  ---------------------------------
>  Never miss a thing.   Make Yahoo your homepage.


-- 
Blog: http://www.lexemetech.com/

Re: Hadoop on EC2 for large cluster

Posted by Chris K Wensel <ch...@wensel.net>.
you can't do this with the contrib/ec2 scripts/ami.

but passing the master private dns name to the slaves on boot as 'user- 
data' works fine. when a slave starts, it contacts the master and  
joins the cluster. there isn't any need for a slave to rsync from the  
master, thus removing the dependency on them having the private key.  
and not using the start|stop-all scripts, you don't need to maintain  
the slaves file, and can thus lazily boot your cluster.

to do this, you will need to create your own AMI that works this way.  
not hard, just time consuming.

On Mar 20, 2008, at 11:56 AM, Prasan Ary wrote:
> Chris,
>  What do you mean when you say boot the slaves with "the master  
> private name" ?
>
>
>  =======================
>
> Chris K Wensel <ch...@wensel.net> wrote:
>  I found it much better to start the master first, then boot the  
> slaves
> with the master private name.
>
> i do not use the start|stop-all scrips, so i do not need to maintain
> the slaves file. thus i don't need to push private keys around to
> support those scripts.
>
> this lets me start 20 nodes, then add 20 more later. or kill some.
>
> btw, get ganglia installed. life will be better knowing what's going  
> on.
>
> also, setting up FoxyProxy on firefox lets you browse your whole
> cluster if you setup a ssh tunnel (socks).
>
> On Mar 20, 2008, at 10:15 AM, Prasan Ary wrote:
>> Hi All,
>> I have been trying to configure Hadoop on EC2 for large number of
>> clusters ( 100 plus). It seems that I have to copy EC2 private key
>> to all the machines in the cluster so that they can have SSH
>> connections.
>> For now it seems I have to run a script to copy the key file to
>> each of the EC2 instances. I wanted to know if there is a better way
>> to accomplish this.
>>
>> Thanks,
>> PA
>>
>>
>> ---------------------------------
>> Never miss a thing. Make Yahoo your homepage.
>
> Chris K Wensel
> chris@wensel.net
> http://chris.wensel.net/
>
>
>
>
>
>
> ---------------------------------
> Looking for last minute shopping deals?  Find them fast with Yahoo!  
> Search.

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/




Re: Hadoop on EC2 for large cluster

Posted by Prasan Ary <vo...@yahoo.com>.
Chris,
  What do you mean when you say boot the slaves with "the master private name" ?
   
   
  =======================

Chris K Wensel <ch...@wensel.net> wrote:
  I found it much better to start the master first, then boot the slaves 
with the master private name.

i do not use the start|stop-all scrips, so i do not need to maintain 
the slaves file. thus i don't need to push private keys around to 
support those scripts.

this lets me start 20 nodes, then add 20 more later. or kill some.

btw, get ganglia installed. life will be better knowing what's going on.

also, setting up FoxyProxy on firefox lets you browse your whole 
cluster if you setup a ssh tunnel (socks).

On Mar 20, 2008, at 10:15 AM, Prasan Ary wrote:
> Hi All,
> I have been trying to configure Hadoop on EC2 for large number of 
> clusters ( 100 plus). It seems that I have to copy EC2 private key 
> to all the machines in the cluster so that they can have SSH 
> connections.
> For now it seems I have to run a script to copy the key file to 
> each of the EC2 instances. I wanted to know if there is a better way 
> to accomplish this.
>
> Thanks,
> PA
>
>
> ---------------------------------
> Never miss a thing. Make Yahoo your homepage.

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/





       
---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

Re: Hadoop on EC2 for large cluster

Posted by Chris K Wensel <ch...@wensel.net>.
I found it much better to start the master first, then boot the slaves  
with the master private name.

i do not use the start|stop-all scrips, so i do not need to maintain  
the slaves file. thus i don't need to push private keys around to  
support those scripts.

this lets me start 20 nodes, then add 20 more later. or kill some.

btw, get ganglia installed. life will be better knowing what's going on.

also, setting up FoxyProxy on firefox lets you browse your whole  
cluster if you setup a ssh tunnel (socks).

On Mar 20, 2008, at 10:15 AM, Prasan Ary wrote:
> Hi All,
>  I have been trying to configure Hadoop on EC2 for large number of  
> clusters ( 100 plus). It seems that I have to copy EC2 private key  
> to all the machines in the cluster so that they can have SSH  
> connections.
>  For now it seems I have to run a script to copy the key file to  
> each of the EC2 instances. I wanted to know if there is a better way  
> to accomplish this.
>
>  Thanks,
>  PA
>
>
> ---------------------------------
> Never miss a thing.   Make Yahoo your homepage.

Chris K Wensel
chris@wensel.net
http://chris.wensel.net/