You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2007/01/11 10:30:27 UTC

[jira] Created: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Create scripts to run Hadoop on Amazon EC2
------------------------------------------

                 Key: HADOOP-884
                 URL: https://issues.apache.org/jira/browse/HADOOP-884
             Project: Hadoop
          Issue Type: New Feature
          Components: fs
    Affects Versions: 0.10.1
            Reporter: Tom White
         Assigned To: Tom White


It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-884:
-----------------------------

    Component/s:     (was: fs)
                 scripts

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-884:
-----------------------------

    Attachment: hadoop-ec2-v1.tar.gz

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-ec2-v1.tar.gz
>
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468423 ] 

Hadoop QA commented on HADOOP-884:
----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12349853/hadoop-884.patch applied and successfully tested against trunk revision r501182.

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-884.patch, hadoop-ec2-v1.tar.gz
>
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465883 ] 

Tom White commented on HADOOP-884:
----------------------------------

Yes, contrib/ec2/bin/ sounds like the right place.

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-ec2-v1.tar.gz
>
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465698 ] 

Tom White commented on HADOOP-884:
----------------------------------

I've attached a collection of scripts for this feature. It is still rough round the edges, and not ready for inclusion yet (indeed they should probalby be separate from the hadoop distribution), but the scripts work for me on Mac OS X and ubuntu. I've added instructions to the wiki at http://wiki.apache.org/lucene-hadoop/AmazonEC2.

There are lots of improvements that could be made. 

 * Create a Hadoop AMI that runs a parameterized launch to set cluster size and master hostname. See http://docs.amazonwebservices.com/AmazonEC2/dg/2006-10-01/AESDG-chapter-instancedata.html. Such an instance would modify the Hadoop config files on startup to reflect cluster size and master hostname.
 * Setting up DNS is a pain. We could either automate the DNS configuration using DynDNS's webservice (https://www.dyndns.com/developers/specs/syntax.html), or do away with having to set up DNS altogether.
 * Create a public Hadoop AMI (for each Hadoop version) so people don't need to build their own. See http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured.
 * Adapt `run-hadoop-cluster` to take the jar containing the MapReduce job as a parameter.


> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-ec2-v1.tar.gz
>
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-884:
-----------------------------

    Status: Patch Available  (was: In Progress)

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-884.patch, hadoop-ec2-v1.tar.gz
>
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465855 ] 

Doug Cutting commented on HADOOP-884:
-------------------------------------

I don't think these should go in the normal bin/ directory, but I think including them in the distribution tarfile might be good.  They could perhaps go in contrib/ec2/bin/?

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-ec2-v1.tar.gz
>
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Work started: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HADOOP-884 started by Tom White.

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: fs
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-884:
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.11.0
           Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Tom!

A couple of future improvements to ponder:
  - perhaps the env file shouldn't be in subversion, but rather a template should be that's copied into place.  That way we don't risk checking in an editted version.
  - bit of documentation, perhaps just a README, should ideally be bundled with this.

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>             Fix For: 0.11.0
>
>         Attachments: hadoop-884.patch, hadoop-ec2-v1.tar.gz
>
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Lee Faris (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466672 ] 

Lee Faris commented on HADOOP-884:
----------------------------------

I was thinking more along the lines of calling the EC2 web service directly via Java.  The command line tools are thin wrappers around the web service.  

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-ec2-v1.tar.gz
>
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "James P. White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465849 ] 

James P. White commented on HADOOP-884:
---------------------------------------

I'm quite sure the solution to the DNS problem is Zeroconf.

http://www.ifcx.org/wiki/LocalNetworking.html

http://zeroconf.org/

Amazon is already using it for the parameterized launch.  That where the funny "169.254.169.254" address comes from.

http://docs.amazonwebservices.com/AmazonEC2/dg/2006-10-01/TechnicalFAQ.html#d0e14061

There are several ways that this can be approached.  The one that would help the most people would be to make Hadoop Zeroconf-aware (slaves using service discovery to find the master), but probably the place to start is to just enhance these EC2 scripts.  

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-ec2-v1.tar.gz
>
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-884:
-----------------------------

    Attachment: hadoop-884.patch

The attached patch includes the Hadoop EC2 scripts in contrib/ec2/bin. I think they are ready for inclusion in the main distribution now.

I have extended the scripts since the version in the tar.gz file by making them more robust: they no longer have to be unpacked and invoked from the user's home directory. More significantly, I have used a parameterized launch to set cluster size and master hostname. Previously, you had to build an image for a particular cluster size and hostname - now you can build one image and choose the cluster size and host  name at launch time. (This is a step towards shared Hadoop images.)

As for the other improvements, I will create new Jira issues for them, since the basic scripts are in a working state (although I would love feedback if anyone tries them out).

James - thank you for the suggestion about Zeroconf. I've not had any experience with it, so any help would be appreciated.

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-884.patch, hadoop-ec2-v1.tar.gz
>
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12465890 ] 

Doug Cutting commented on HADOOP-884:
-------------------------------------

Please mark this as "Patch Available" when you feel these scripts are ready for inclusion.  Hopefully they'll make the 0.11 release in two weeks.

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-ec2-v1.tar.gz
>
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HADOOP-884) Create scripts to run Hadoop on Amazon EC2

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466679 ] 

Tom White commented on HADOOP-884:
----------------------------------

I agree that long term it would be more efficient to call the EC2 web service via Java, and these scripts could be the basis for this. At the moment, I'm focusing on getting the scripts working smoothly. 

> Create scripts to run Hadoop on Amazon EC2
> ------------------------------------------
>
>                 Key: HADOOP-884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-884
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 0.10.1
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-ec2-v1.tar.gz
>
>
> It is already possible to run Hadoop on Amazon EC2 (http://wiki.apache.org/lucene-hadoop/AmazonEC2), however it is a rather involved, largely manual process. By writing scripts to automate (as far as is possible) image creation and cluster launch it will make it much easier to use Hadoop on EC2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.