You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Stephen Watt <sw...@us.ibm.com> on 2008/09/29 21:19:51 UTC

Getting Hadoop Working on EC2/S3

Hi Folks

Before I get started, I just want to state that I've done the due 
diligence and read Tom White's articles as well as EC2 and S3 pages on the 
Hadoop Wiki and done some searching on this.

Thus far I have successfully got Hadoop running on EC2 with no problems. 
In my local hadoop 0.18 environment I simply add my AWS keys to the 
hadoop-ec2-env.sh and kickoff the src/contrib/ec2/bin/hadoop-ec2 launch 
cluster script and it works great. 

Now, I'm trying to use the Public Haodop EC2 images to run over S3 instead 
of HDFS. They are set up to use variables passed in from a parameterized 
launch for all the config options everything EXCEPT the 
fs.default.filesystem. So in order to bring a cluster of 20 hadoop 
instances up that run over S3, I need to mod the config file to point to 
my S3 bucket for the fs.default.filesystem and keep the rest the same. 
Thus I need my own image to do this.  I am attempting this by using the 
local src/contrib/ec2/bin/hadoop-ec2 create-image script. I've tried this 
both on a windows system (cygwin environment) AND on my ubuntu 8 system 
and with each one it gets all the way to the end and fails as it attempts 
to save the new image to my bucket and says the bucket does not exist with 
a Server.NoSuchBucket (404) error. 

The S3 bucket definitely does exist. I have block data inside of it that 
are results of my Hadoop Jobs. I can go to a single hadoop image on EC2 
that I've launched and manually set up to use S3 and say bin/hadoop dfs 
-ls / and I can see the contents of my S3 bucket. I can also succesfully 
use that s3 bucket as an input and output of my jobs for a single EC2 
hadoop instance. I've tried creating new buckets using the FireFox S3 
Organizer plugin and specifying the scripts to save my new image to those 
and its still the same error. 

Any ideas ? Is anyone having similar problems ?

Regards
Steve Watt

Re: Getting Hadoop Working on EC2/S3

Posted by Stephen Watt <sw...@us.ibm.com>.

I think we've identified a bug with the create-image parameter on the ec2 
scripts under src/contrib

This was my workaround. 

1) Start a single instance of the Hadoop AMI you want to modify using the 
ElasticFox firefox plugin (or the ec2-tools)
2) Modify the /root/hadoop-init script and change the fs.default.name 
property to point to the FULL s3 path to your bucket (after doing this 
make sure you do not make your image public!)
3) Follow the instructions at 
http://docs.amazonwebservices.com/AWSEC2/2008-05-05/GettingStartedGuide/ 
for bundling, uploading and registering your new AMI.
4) On your local machine, in the hadoop-ec2-env.sh file, change the 
S3_BUCKET to point to your private s3 bucket where you uploaded your new 
image.  Change the HADOOP_VERSION to your new AMI name.

You can now go to your cmd prompt and say "bin/hadoop-ec2 launch-cluster 
myClusterName 5"  and it will bring up 5 instances in a hadoop cluster all 
running off your S3 Bucket instead of HDFS.

Kind regards

============================================
Steve Watt
IBM Certified IT Architect
Open Group Certified Master IT Architect

Tel: (512) 286 - 9170
Tie: 363 - 9170
Emerging Technologies, Austin, TX
IBM Software Group
============================================



From:
"Alexander Aristov" <al...@gmail.com>
To:
core-user@hadoop.apache.org
Date:
09/30/2008 01:24 AM
Subject:
Re: Getting Hadoop Working on EC2/S3



Does you AWS (S3) key contain the "?" sing ? If so then it can be a cause.
Regenerate the key in this case.

I have also tried to use the create-image command but I stopped all 
attempts
after constant failures, It was easier to make AMI by hands.

Alexander

2008/9/29 Stephen Watt <sw...@us.ibm.com>

> Hi Folks
>
> Before I get started, I just want to state that I've done the due
> diligence and read Tom White's articles as well as EC2 and S3 pages on 
the
> Hadoop Wiki and done some searching on this.
>
> Thus far I have successfully got Hadoop running on EC2 with no problems.
> In my local hadoop 0.18 environment I simply add my AWS keys to the
> hadoop-ec2-env.sh and kickoff the src/contrib/ec2/bin/hadoop-ec2 launch
> cluster script and it works great.
>
> Now, I'm trying to use the Public Haodop EC2 images to run over S3 
instead
> of HDFS. They are set up to use variables passed in from a parameterized
> launch for all the config options everything EXCEPT the
> fs.default.filesystem. So in order to bring a cluster of 20 hadoop
> instances up that run over S3, I need to mod the config file to point to
> my S3 bucket for the fs.default.filesystem and keep the rest the same.
> Thus I need my own image to do this.  I am attempting this by using the
> local src/contrib/ec2/bin/hadoop-ec2 create-image script. I've tried 
this
> both on a windows system (cygwin environment) AND on my ubuntu 8 system
> and with each one it gets all the way to the end and fails as it 
attempts
> to save the new image to my bucket and says the bucket does not exist 
with
> a Server.NoSuchBucket (404) error.
>
> The S3 bucket definitely does exist. I have block data inside of it that
> are results of my Hadoop Jobs. I can go to a single hadoop image on EC2
> that I've launched and manually set up to use S3 and say bin/hadoop dfs
> -ls / and I can see the contents of my S3 bucket. I can also succesfully
> use that s3 bucket as an input and output of my jobs for a single EC2
> hadoop instance. I've tried creating new buckets using the FireFox S3
> Organizer plugin and specifying the scripts to save my new image to 
those
> and its still the same error.
>
> Any ideas ? Is anyone having similar problems ?
>
> Regards
> Steve Watt




-- 
Best Regards
Alexander Aristov

Re: Getting Hadoop Working on EC2/S3

Posted by Alexander Aristov <al...@gmail.com>.

Does you AWS (S3) key contain the "?" sing ? If so then it can be a cause.
Regenerate the key in this case.

I have also tried to use the create-image command but I stopped all attempts
after constant failures, It was easier to make AMI by hands.

Alexander

2008/9/29 Stephen Watt <sw...@us.ibm.com>

> Hi Folks
>
> Before I get started, I just want to state that I've done the due
> diligence and read Tom White's articles as well as EC2 and S3 pages on the
> Hadoop Wiki and done some searching on this.
>
> Thus far I have successfully got Hadoop running on EC2 with no problems.
> In my local hadoop 0.18 environment I simply add my AWS keys to the
> hadoop-ec2-env.sh and kickoff the src/contrib/ec2/bin/hadoop-ec2 launch
> cluster script and it works great.
>
> Now, I'm trying to use the Public Haodop EC2 images to run over S3 instead
> of HDFS. They are set up to use variables passed in from a parameterized
> launch for all the config options everything EXCEPT the
> fs.default.filesystem. So in order to bring a cluster of 20 hadoop
> instances up that run over S3, I need to mod the config file to point to
> my S3 bucket for the fs.default.filesystem and keep the rest the same.
> Thus I need my own image to do this.  I am attempting this by using the
> local src/contrib/ec2/bin/hadoop-ec2 create-image script. I've tried this
> both on a windows system (cygwin environment) AND on my ubuntu 8 system
> and with each one it gets all the way to the end and fails as it attempts
> to save the new image to my bucket and says the bucket does not exist with
> a Server.NoSuchBucket (404) error.
>
> The S3 bucket definitely does exist. I have block data inside of it that
> are results of my Hadoop Jobs. I can go to a single hadoop image on EC2
> that I've launched and manually set up to use S3 and say bin/hadoop dfs
> -ls / and I can see the contents of my S3 bucket. I can also succesfully
> use that s3 bucket as an input and output of my jobs for a single EC2
> hadoop instance. I've tried creating new buckets using the FireFox S3
> Organizer plugin and specifying the scripts to save my new image to those
> and its still the same error.
>
> Any ideas ? Is anyone having similar problems ?
>
> Regards
> Steve Watt




-- 
Best Regards
Alexander Aristov