You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Tom White (JIRA)" <ji...@apache.org> on 2007/01/29 22:44:49 UTC

[jira] Created: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Create a public (shared) Hadoop EC2 AMI
---------------------------------------

                 Key: HADOOP-952
                 URL: https://issues.apache.org/jira/browse/HADOOP-952
             Project: Hadoop
          Issue Type: Improvement
          Components: scripts
    Affects Versions: 0.11.0
            Reporter: Tom White
         Assigned To: Tom White


HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "James P. White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474051 ] 

James P. White commented on HADOOP-952:
---------------------------------------

I tried out the v4-tar scripts and they work fine.  The choice of the "start-" prefix was reflecting that the cluster nodes were being started, but obviously your choice is fine.

One thing that might be worthwhile adding is a HADOOP_HOME setting in the AMI perhaps with a PATH update too.  That could be with a "/root/hadoop-env.sh" file and/or a "/usr/local/hadoop-current" symlink or the like.

I see how the HADOOP_VERSION in the local env.sh works and selects the right AMI, and maybe avoiding the duplicate settings is the right thing, but with the way it is now it doesn't "feel" like Hadoop is "installed" in the AMI.  But since I'm a cluster newbie. this may be something I'll change my mind on.

And speaking of that, the Hadoop version thing being in those jar file names seems like a problem too. 

Any notion what's wrong with my attempt to run "bin/hadoop jar hadoop-0.11.1-test.jar DFSCIOTest -write "?



> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: ec2-ami-bin-v2.tar, ec2-ami-bin.tar, hadoop-952-jim-v2.patch, hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952-v4.patch, hadoop-952-v4.tar, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "James P. White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James P. White updated HADOOP-952:
----------------------------------

    Attachment: hadoop-952-jim-v2.patch

I was in a rush and hadn't tested the refactored run-hadoop-cluster so of course it was quite broken.  The patch for the fixed and tested version is attached.

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: ec2-ami-bin.tar, hadoop-952-jim-v2.patch, hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "James P. White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James P. White updated HADOOP-952:
----------------------------------

    Attachment: ec2-ami-bin-v2.tar

Ditto to the above - tar of scripts with jim-v2 patch applied.

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: ec2-ami-bin-v2.tar, ec2-ami-bin.tar, hadoop-952-jim-v2.patch, hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468427 ] 

Tom White commented on HADOOP-952:
----------------------------------

I was planning on using my S3 storage - at least until the AMI got too popular :)

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-952:
-----------------------------

    Attachment: hadoop-952.patch

This patch includes changes to the EC2 scripts to support creation of public AMIs. The main changes are to do with tightening up security - there is a good checklist at http://docs.amazonwebservices.com/AmazonEC2/dg/2006-10-01/public-ami-guidelines.html. The important thing is to clear out keys before bundling the image. Also, since the hadoop AMIs were previously private it was OK to create a new SSH key for the cluster and embed it in the image - this is now a big no no, since it would allow people to connect to someone else's cluster! Instead, your EC2 keypair is used for password-less logins across the cluster.

Before publishing some images, it would be great if someone could test a private image I have created and sanity check the setup. I'll grant access using the mechanism described here: http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured. So, if you have an EC2 account and would like to help please email me (off list) with your AWS account ID (note this is _not_ either of your access keys). 

After this I'll create a public image.



> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "James P. White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12473004 ] 

James P. White commented on HADOOP-952:
---------------------------------------

So now the startup scripts are all lovely and I can run the Pi example.  Trying to find other tests to run, and I came up with:

[root@domU-12-31-34-00-02-B4 hadoop-0.11.1]# bin/hadoop jar hadoop-0.11.1-test.jar DFSCIOTest -write
DFSCIOTest.0.0.1
07/02/14 01:24:03 INFO mapred.InputFormatBase: nrFiles = 1
07/02/14 01:24:03 INFO mapred.InputFormatBase: fileSize (MB) = 1
07/02/14 01:24:03 INFO mapred.InputFormatBase: bufferSize = 1000000
/usr/local/hadoop-0.11.1/libhdfs/libhdfs.so.1: No such file or directory

That looks like some LIBPATH problem.


> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: ec2-ami-bin-v2.tar, ec2-ami-bin.tar, hadoop-952-jim-v2.patch, hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-952:
-----------------------------

    Attachment: hadoop-952-v3.patch

New patch (v3) that fixes a security hole uncovered by Jim. There still seems to be a problem in Jim's setup which is producing an ArithmeticException (see HADOOP-1013).

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-952:
-----------------------------

       Resolution: Fixed
    Fix Version/s: 0.12.0
           Status: Resolved  (was: Patch Available)

I've just committed this.

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.2
>            Reporter: Tom White
>         Assigned To: Tom White
>             Fix For: 0.12.0
>
>         Attachments: ec2-ami-bin-v2.tar, ec2-ami-bin.tar, hadoop-952-jim-v2.patch, hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952-v4.patch, hadoop-952-v4.tar, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "James P. White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474280 ] 

James P. White commented on HADOOP-952:
---------------------------------------

+1 on committing.  Dealing with patches makes this code hard to work on.

Issues are cheap (especially when they get closed), so opening new ones for enhancements and the like is probably a good idea.  I like the project's approach of keeping most discussion in Jira (because posts on the list will be lost to future developer/users whereas Jira issues keep it all together).  The unclear bit is when it makes sense to put somewhat-off-the-issue's-topic-but-related in a different issue or list posting.

So whenever you're ready to close this issue and move on is fine with me.


> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: ec2-ami-bin-v2.tar, ec2-ami-bin.tar, hadoop-952-jim-v2.patch, hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952-v4.patch, hadoop-952-v4.tar, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-952:
-----------------------------

    Attachment: hadoop-952-v2.patch

Jim,

Thanks for giving the scripts a whirl. Looks like you may have been using the unpatched scripts as the patch fixes the variables pretty much in the way you suggest (sorry if I wasn't clearer in explaining how to use them). Nevertheless I've improved the wording in the set up file further, and I've included your handy login script in a new patch. (I didn't include your rerun script, as I hope this won't be needed too much.)

The ArithmeticException is a mystery to me as I haven't been able to reproduce it, which is odd given that we are running the same AMI. Could you run any of the other examples? Also it might be worth looking in the log files to see if anything else failed.

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-952-v2.patch, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470375 ] 

Tom White commented on HADOOP-952:
----------------------------------

Actually, it is not true that embedding a private SSH key in the image would allow people to connect to other clusters: the cluster runs in a security group that only allows other machines in the cluster or a given owner to connect. (See the ec2-authorize command.) However, it is still a bad idea to embed a private SSH key in an image, in case people get their security groups misconfigured.

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-952:
-----------------------------

    Affects Version/s:     (was: 0.11.0)
                       0.11.2
               Status: Patch Available  (was: Open)

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.2
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: ec2-ami-bin-v2.tar, ec2-ami-bin.tar, hadoop-952-jim-v2.patch, hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952-v4.patch, hadoop-952-v4.tar, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474601 ] 

Doug Cutting commented on HADOOP-952:
-------------------------------------

+1 This looks good to me.

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.2
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: ec2-ami-bin-v2.tar, ec2-ami-bin.tar, hadoop-952-jim-v2.patch, hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952-v4.patch, hadoop-952-v4.tar, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "James P. White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James P. White updated HADOOP-952:
----------------------------------

    Attachment: hadoop-952-jim.patch

This patch implements the changes I suggest, including a sanity check at the initialization step that  SSH to the master works.

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "James P. White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12471973 ] 

James P. White commented on HADOOP-952:
---------------------------------------

Hi Tom!

You wrote:

> ...
> Any problems or questions, give me a shout! (Let me know how it goes 
> anyway.)

I've gotten setup on EC2 and gave your image a whirl.

The biggest problem I had was figuring out the S3_BUCKET.  

I got HADOOP_VERSION wrong a couple times.

I also spent a while getting the EC2_KEYDIR and SSH_OPTS set to use my scheme.

These are the settings I wound up with:

# The Amazon S3 bucket where the Hadoop AMI you create will be stored.
S3_BUCKET=hadoop-ec2-images

# Location of EC2 keys.
# The default setting is probably OK if you set up EC2 following the Amazon Getting Started guide.
EC2_KEYDIR=`dirname "$EC2_PRIVATE_KEY"`

# SSH options used when connecting to EC2 instances.
# Change the -i option to be the absolute path to your keypair that you set up in the Amazon Getting Started guide.
SSH_OPTS=`echo -i "$EC2_KEYDIR"/id_rsa-gsg-keypair -o StrictHostKeyChecking=no`

# The download URL for the Sun JDK. Visit http://java.sun.com/javase/downloads/index_jdk5.jsp and get the URL for the "Linux self-extracting file".
JAVA_BINARY_URL=''

# The version number of the installed JDK.
JAVA_VERSION=1.5.0_11

# The EC2 group to run your cluster in.
GROUP=hadoop-cluster-group

# The version of Hadoop to install.
HADOOP_VERSION=0.11.0

I think those are somewhat better defaults.  The others are much more self-explanatory.

I also had to rerun the run-cluster code following the "Waiting before ..." point multiple times to get the settings worked out, so I made a shortened version (rerun-).  I also made a login script (which turns out to be a good test before doing the "Creating instances... business").

I then tried to run the pi sample job per the wiki page, but get an exception:

[root@domU-12-31-34-00-03-2F ~]# cd /usr/local/hadoop-0.11.0/
[root@domU-12-31-34-00-03-2F hadoop-0.11.0]# bin/hadoop jar hadoop-0.11.0-examples.jar pi 10 10000000
Number of Maps = 10 Samples per Map = 10000000
org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.ArithmeticException: / by zero
        at org.apache.hadoop.dfs.FSNamesystem$Replicator.chooseTarget(FSNamesystem.java:2593)
        at org.apache.hadoop.dfs.FSNamesystem$Replicator.chooseTarget(FSNamesystem.java:2555)
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:684)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:248)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:337)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:538)

        at org.apache.hadoop.ipc.Client.call(Client.java:467)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
        at org.apache.hadoop.dfs.$Proxy0.create(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateNewBlock(DFSClient.java:1091)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:1031)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.endBlock(DFSClient.java:1255)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.close(DFSClient.java:1345)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at org.apache.hadoop.fs.FSDataOutputStream$Summer.close(FSDataOutputStream.java:98)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at java.io.FilterOutputStream.close(FilterOutputStream.java:143)
        at org.apache.hadoop.io.SequenceFile$Writer.close(SequenceFile.java:724)
        at org.apache.hadoop.examples.PiEstimator.launch(PiEstimator.java:185)
        at org.apache.hadoop.examples.PiEstimator.main(PiEstimator.java:226)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:143)
        at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:40)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
[root@domU-12-31-34-00-03-2F hadoop-0.11.0]# 



> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-952:
-----------------------------

    Attachment: hadoop-952-v4.patch

This v4 patch applies all of Jim's changes, but with the following differences:
 * Re-instate some changes in the create-hadoop-image script that had been lost.
 * Rename start-hadoop-cluster to launch-hadoop-cluster, and init-hadoop-cluster to start-hadoop.
 * Add a top-level script, hadoop-ec2, for running commands, and which provides simple usage instructions.

Once this is committed, I will make the AMIs public and update the wiki instructions.


> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: ec2-ami-bin-v2.tar, ec2-ami-bin.tar, hadoop-952-jim-v2.patch, hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952-v4.patch, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "Tom White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12474273 ] 

Tom White commented on HADOOP-952:
----------------------------------

Jim. Glad the scripts work for you. Would you be happy for the changes to be committed? I feel the enhancements you mention belong in another Jira. (I've been thinking about how to manage various versions of Hadoop AMIs, so it would be good to take this further.)

I'll look at the DFSCIOTest problem too.

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: ec2-ami-bin-v2.tar, ec2-ami-bin.tar, hadoop-952-jim-v2.patch, hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952-v4.patch, hadoop-952-v4.tar, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "James P. White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James P. White updated HADOOP-952:
----------------------------------

    Attachment: ec2-ami-bin.tar

This is a tar of the src/contrib/ec2 directory with my patch applied.  This would be helpful to someone who wanted to do the minimum to get started on EC2+Hadoop.

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: ec2-ami-bin.tar, hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-952:
-----------------------------

    Attachment: hadoop-952-v4.tar

A corresponding tar file of changes: hadoop-952-v4.tar.

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: ec2-ami-bin-v2.tar, ec2-ami-bin.tar, hadoop-952-jim-v2.patch, hadoop-952-jim.patch, hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952-v4.patch, hadoop-952-v4.tar, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "James P. White (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12472903 ] 

James P. White commented on HADOOP-952:
---------------------------------------

I've applied the v3 patch and tried out the new AMI which successfully ran the "pi" example.

Had some trouble getting the SSH settings right again.  I need env.sh to look like this:

# Location of EC2 keys.
# The default setting is probably OK if you set up EC2 following the Amazon Getting Started guide.
EC2_KEYDIR=`dirname "$EC2_PRIVATE_KEY"`

# The EC2 key name used to launch instances.
# The default is the value used in the Amazon Getting Started guide.
KEY_NAME=gsg-keypair

# Where your EC2 private key is stored (created when following the Amazon Getting Started guide).
PRIVATE_KEY_PATH=`echo "$EC2_KEYDIR"/"id_rsa-$KEY_NAME"`

# SSH options used when connecting to EC2 instances.
SSH_OPTS=`echo -i "$PRIVATE_KEY_PATH" -o StrictHostKeyChecking=no`

The reason for the 'echo ...` business is that I need paths with embedded spaces to work.

Also I really think 'run-hadoop-cluster' should be split in two.  The part where it waits for DynDNS to be set up should simply end and have the second part be a seperate script.  A user with a new set up would also be advised to run "login-hadoop-cluster" before running the second part to verify the settings.



> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>         Attachments: hadoop-952-v2.patch, hadoop-952-v3.patch, hadoop-952.patch
>
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-952) Create a public (shared) Hadoop EC2 AMI

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468426 ] 

Doug Cutting commented on HADOOP-952:
-------------------------------------

This would be great to have!  Someone would need to donate the S3 storage for these images, but that should be pretty cheap.

> Create a public (shared) Hadoop EC2 AMI
> ---------------------------------------
>
>                 Key: HADOOP-952
>                 URL: https://issues.apache.org/jira/browse/HADOOP-952
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: scripts
>    Affects Versions: 0.11.0
>            Reporter: Tom White
>         Assigned To: Tom White
>
> HADOOP-884 makes it easy to run Hadoop on an EC2 cluster, but building an AMI (Abstract Machine Image) can take a little while. Amazon EC2 supports shared AMIs (http://developer.amazonwebservices.com/connect/entry.jspa?entryID=530&ref=featured), so we could provide publically available AMIs for each Hadoop release.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.