You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Prasan Ary <vo...@yahoo.com> on 2008/03/25 21:07:15 UTC

Map/reduce with input files on S3

I am running hadoop on EC2. I want to run a jar MR application on EC2 such that input and output files are on S3.
   
  I configured hadoop-site.xml so that  fs.default.name property points to my s3 bucket with all required identifications  (eg; s3://<ID>:<secret key>@<bucket> ). I created an input directory in this bucket and put an input file in this directory. Then I restarted hadoop so that the new configuration takes into effect.
   
  When I try to run the jar file now, I get the message "Hook previously registered"  and the application dies.
   
  Any Idea what might have gone wrong?
   
  thanks.

       
---------------------------------
Never miss a thing.   Make Yahoo your homepage.

Re: Map/reduce with input files on S3

Posted by Prasan Ary <vo...@yahoo.com>.
I changed the configuration a little so that the MR jar file now runs on my local hadoop cluster, but takes input files from S3.
  I get the following output:
  
 
  08/03/26 17:32:39 INFO mapred.FileInputFormat: Total input paths to process : 1
  08/03/26 17:32:44 INFO mapred.JobClient: Running job: job_200803031605_0476
  08/03/26 17:32:45 INFO mapred.JobClient: map 100% reduce 100%
  Job failed!
   
  A quick check at log file for jobtracker shows:
  2008-03-26 17:32:45,001 INFO org.apache.hadoop.mapred.JobInProgress: Killing job 'job_200803031605_0476'
  7:19:18,301 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to /192.168.0.240:54311
  at org.apache.hadoop.ipc.Server.bind(Server.java:193)
  at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:252)
  at org.apache.hadoop.ipc.Server.<init>(Server.java:973)
  at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:393)
  at org.apache.hadoop.ipc.RPC.getServer(RPC.java:355)
  at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:639)
  at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:124)
  at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2114)
  2008-03-26 17:19:18,302 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 
  /************************************************************
  SHUTDOWN_MSG: Shutting down JobTracker at 192.168.0.240/192.168.0.240
  ************************************************************/
  2008-03-26 17:32:10,784 INFO org.apache.hadoop.mapred.JobTracker: STARTUP_MSG: 
  /************************************************************
  STARTUP_MSG: Starting JobTracker
  STARTUP_MSG: host = 192.168.0.240/192.168.0.240
  STARTUP_MSG: args = []
  STARTUP_MSG: version = 0.16.0
  STARTUP_MSG: build = http://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16 -r 618351; compiled by 'hadoopqa' on Mon Feb 4 19:29:11 UTC 2008
  ************************************************************/
  2008-03-26 17:32:10,943 FATAL org.apache.hadoop.mapred.JobTracker: java.net.BindException: Problem binding to /192.168.0.240:54311
  at org.apache.hadoop.ipc.Server.bind(Server.java:193)
  at org.apache.hadoop.ipc.Server$Listener.<init>(Server.java:252)
  at org.apache.hadoop.ipc.Server.<init>(Server.java:973)
  at org.apache.hadoop.ipc.RPC$Server.<init>(RPC.java:393)
  at org.apache.hadoop.ipc.RPC.getServer(RPC.java:355)
  at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:639)
  at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:124)
  at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:2114)
  2008-03-26 17:32:10,944 INFO org.apache.hadoop.mapred.JobTracker: SHUTDOWN_MSG: 
  /************************************************************
  SHUTDOWN_MSG: Shutting down JobTracker at 192.168.0.240/192.168.0.240
  ************************************************************/
   
  Any Help ?
   
  ---------------------------------------------------------
  
Tom White <to...@gmail.com> wrote:
  > I wonder if it is related to:
> https://issues.apache.org/jira/browse/HADOOP-3027
>

I think it is - the same problem is fixed for me when using HADOOP-3027.

Tom


       
---------------------------------
Never miss a thing.   Make Yahoo your homepage.

Re: Map/reduce with input files on S3

Posted by Tom White <to...@gmail.com>.
> I wonder if it is related to:
>  https://issues.apache.org/jira/browse/HADOOP-3027
>

I think it is - the same problem is fixed for me when using HADOOP-3027.

Tom

Re: Map/reduce with input files on S3

Posted by Prasan Ary <vo...@yahoo.com>.
Owen,
  Yes I am using Hadoop 0.16.1 .
  No, the jira doesn't relate to my case.
   
  The message "Hook previously registered" comes up only if I try to access files on S3 from my java application running on EC2 . The same application runs smoothly if the input file is copied to image on EC2 and accessed from there.
   
  
------------------------------------------------------------------------------
  
Owen O'Malley <oo...@yahoo-inc.com> wrote:
  
On Mar 25, 2008, at 1:07 PM, Prasan Ary wrote:

> I am running hadoop on EC2. I want to run a jar MR application on 
> EC2 such that input and output files are on S3.

That is an expected case, although I haven't ever used EC2.

> When I try to run the jar file now, I get the message "Hook 
> previously registered" and the application dies.
>
> Any Idea what might have gone wrong?

I wonder if it is related to:
https://issues.apache.org/jira/browse/HADOOP-3027

Are you running 0.16.1?

-- Owen

       
---------------------------------
Be a better friend, newshound, and know-it-all with Yahoo! Mobile.  Try it now.

Re: Map/reduce with input files on S3

Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Mar 25, 2008, at 1:07 PM, Prasan Ary wrote:

> I am running hadoop on EC2. I want to run a jar MR application on  
> EC2 such that input and output files are on S3.

That is an expected case, although I haven't ever used EC2.

>   When I try to run the jar file now, I get the message "Hook  
> previously registered"  and the application dies.
>
>   Any Idea what might have gone wrong?

I wonder if it is related to:
https://issues.apache.org/jira/browse/HADOOP-3027

Are you running 0.16.1?

-- Owen