You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Stephen Watt <sw...@us.ibm.com> on 2008/09/29 21:19:51 UTC
Getting Hadoop Working on EC2/S3
Hi Folks
Before I get started, I just want to state that I've done the due
diligence and read Tom White's articles as well as EC2 and S3 pages on the
Hadoop Wiki and done some searching on this.
Thus far I have successfully got Hadoop running on EC2 with no problems.
In my local hadoop 0.18 environment I simply add my AWS keys to the
hadoop-ec2-env.sh and kickoff the src/contrib/ec2/bin/hadoop-ec2 launch
cluster script and it works great.
Now, I'm trying to use the Public Haodop EC2 images to run over S3 instead
of HDFS. They are set up to use variables passed in from a parameterized
launch for all the config options everything EXCEPT the
fs.default.filesystem. So in order to bring a cluster of 20 hadoop
instances up that run over S3, I need to mod the config file to point to
my S3 bucket for the fs.default.filesystem and keep the rest the same.
Thus I need my own image to do this. I am attempting this by using the
local src/contrib/ec2/bin/hadoop-ec2 create-image script. I've tried this
both on a windows system (cygwin environment) AND on my ubuntu 8 system
and with each one it gets all the way to the end and fails as it attempts
to save the new image to my bucket and says the bucket does not exist with
a Server.NoSuchBucket (404) error.
The S3 bucket definitely does exist. I have block data inside of it that
are results of my Hadoop Jobs. I can go to a single hadoop image on EC2
that I've launched and manually set up to use S3 and say bin/hadoop dfs
-ls / and I can see the contents of my S3 bucket. I can also succesfully
use that s3 bucket as an input and output of my jobs for a single EC2
hadoop instance. I've tried creating new buckets using the FireFox S3
Organizer plugin and specifying the scripts to save my new image to those
and its still the same error.
Any ideas ? Is anyone having similar problems ?
Regards
Steve Watt
Re: Getting Hadoop Working on EC2/S3
Posted by Stephen Watt <sw...@us.ibm.com>.
I think we've identified a bug with the create-image parameter on the ec2
scripts under src/contrib
This was my workaround.
1) Start a single instance of the Hadoop AMI you want to modify using the
ElasticFox firefox plugin (or the ec2-tools)
2) Modify the /root/hadoop-init script and change the fs.default.name
property to point to the FULL s3 path to your bucket (after doing this
make sure you do not make your image public!)
3) Follow the instructions at
http://docs.amazonwebservices.com/AWSEC2/2008-05-05/GettingStartedGuide/
for bundling, uploading and registering your new AMI.
4) On your local machine, in the hadoop-ec2-env.sh file, change the
S3_BUCKET to point to your private s3 bucket where you uploaded your new
image. Change the HADOOP_VERSION to your new AMI name.
You can now go to your cmd prompt and say "bin/hadoop-ec2 launch-cluster
myClusterName 5" and it will bring up 5 instances in a hadoop cluster all
running off your S3 Bucket instead of HDFS.
Kind regards
============================================
Steve Watt
IBM Certified IT Architect
Open Group Certified Master IT Architect
Tel: (512) 286 - 9170
Tie: 363 - 9170
Emerging Technologies, Austin, TX
IBM Software Group
============================================
From:
"Alexander Aristov" <al...@gmail.com>
To:
core-user@hadoop.apache.org
Date:
09/30/2008 01:24 AM
Subject:
Re: Getting Hadoop Working on EC2/S3
Does you AWS (S3) key contain the "?" sing ? If so then it can be a cause.
Regenerate the key in this case.
I have also tried to use the create-image command but I stopped all
attempts
after constant failures, It was easier to make AMI by hands.
Alexander
2008/9/29 Stephen Watt <sw...@us.ibm.com>
> Hi Folks
>
> Before I get started, I just want to state that I've done the due
> diligence and read Tom White's articles as well as EC2 and S3 pages on
the
> Hadoop Wiki and done some searching on this.
>
> Thus far I have successfully got Hadoop running on EC2 with no problems.
> In my local hadoop 0.18 environment I simply add my AWS keys to the
> hadoop-ec2-env.sh and kickoff the src/contrib/ec2/bin/hadoop-ec2 launch
> cluster script and it works great.
>
> Now, I'm trying to use the Public Haodop EC2 images to run over S3
instead
> of HDFS. They are set up to use variables passed in from a parameterized
> launch for all the config options everything EXCEPT the
> fs.default.filesystem. So in order to bring a cluster of 20 hadoop
> instances up that run over S3, I need to mod the config file to point to
> my S3 bucket for the fs.default.filesystem and keep the rest the same.
> Thus I need my own image to do this. I am attempting this by using the
> local src/contrib/ec2/bin/hadoop-ec2 create-image script. I've tried
this
> both on a windows system (cygwin environment) AND on my ubuntu 8 system
> and with each one it gets all the way to the end and fails as it
attempts
> to save the new image to my bucket and says the bucket does not exist
with
> a Server.NoSuchBucket (404) error.
>
> The S3 bucket definitely does exist. I have block data inside of it that
> are results of my Hadoop Jobs. I can go to a single hadoop image on EC2
> that I've launched and manually set up to use S3 and say bin/hadoop dfs
> -ls / and I can see the contents of my S3 bucket. I can also succesfully
> use that s3 bucket as an input and output of my jobs for a single EC2
> hadoop instance. I've tried creating new buckets using the FireFox S3
> Organizer plugin and specifying the scripts to save my new image to
those
> and its still the same error.
>
> Any ideas ? Is anyone having similar problems ?
>
> Regards
> Steve Watt
--
Best Regards
Alexander Aristov
Re: Getting Hadoop Working on EC2/S3
Posted by Alexander Aristov <al...@gmail.com>.
Does you AWS (S3) key contain the "?" sing ? If so then it can be a cause.
Regenerate the key in this case.
I have also tried to use the create-image command but I stopped all attempts
after constant failures, It was easier to make AMI by hands.
Alexander
2008/9/29 Stephen Watt <sw...@us.ibm.com>
> Hi Folks
>
> Before I get started, I just want to state that I've done the due
> diligence and read Tom White's articles as well as EC2 and S3 pages on the
> Hadoop Wiki and done some searching on this.
>
> Thus far I have successfully got Hadoop running on EC2 with no problems.
> In my local hadoop 0.18 environment I simply add my AWS keys to the
> hadoop-ec2-env.sh and kickoff the src/contrib/ec2/bin/hadoop-ec2 launch
> cluster script and it works great.
>
> Now, I'm trying to use the Public Haodop EC2 images to run over S3 instead
> of HDFS. They are set up to use variables passed in from a parameterized
> launch for all the config options everything EXCEPT the
> fs.default.filesystem. So in order to bring a cluster of 20 hadoop
> instances up that run over S3, I need to mod the config file to point to
> my S3 bucket for the fs.default.filesystem and keep the rest the same.
> Thus I need my own image to do this. I am attempting this by using the
> local src/contrib/ec2/bin/hadoop-ec2 create-image script. I've tried this
> both on a windows system (cygwin environment) AND on my ubuntu 8 system
> and with each one it gets all the way to the end and fails as it attempts
> to save the new image to my bucket and says the bucket does not exist with
> a Server.NoSuchBucket (404) error.
>
> The S3 bucket definitely does exist. I have block data inside of it that
> are results of my Hadoop Jobs. I can go to a single hadoop image on EC2
> that I've launched and manually set up to use S3 and say bin/hadoop dfs
> -ls / and I can see the contents of my S3 bucket. I can also succesfully
> use that s3 bucket as an input and output of my jobs for a single EC2
> hadoop instance. I've tried creating new buckets using the FireFox S3
> Organizer plugin and specifying the scripts to save my new image to those
> and its still the same error.
>
> Any ideas ? Is anyone having similar problems ?
>
> Regards
> Steve Watt
--
Best Regards
Alexander Aristov