You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Hrishikesh Agashe <hr...@persistent.co.in> on 2009/11/13 17:43:56 UTC

How to call method after all map jobs on slaves nodes are done

Hi,

I am implementing the MapRunnable interface to create the Map jobs.
I have large data set for processing. (Data size is around 10 GB).

I have 1 master and 10 slaves cluster.
When I run my program, hadoop will process data successfully.
After processing, I am collecting all data (all are files) in hadoop temporary directory.

Now my requirement is when all maps are completed on each node I want to call one method which will process the data from temporary directory and finally copy those files on HDFS.

Is there any way to do this?

--Hrishi

DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.

Re: How to call method after all map jobs on slaves nodes are done

Posted by Y G <gy...@gmail.com>.

is your job a single map job without any reduce? if it is ,i think you
could set the num of reduce to 0  then the map intermediate data will
directly output to hdfs from local.

2009/11/14, Hrishikesh Agashe <hr...@persistent.co.in>:
> Hi,
>
> I am implementing the MapRunnable interface to create the Map jobs.
> I have large data set for processing. (Data size is around 10 GB).
>
> I have 1 master and 10 slaves cluster.
> When I run my program, hadoop will process data successfully.
> After processing, I am collecting all data (all are files) in hadoop
> temporary directory.
>
> Now my requirement is when all maps are completed on each node I want to
> call one method which will process the data from temporary directory and
> finally copy those files on HDFS.
>
> Is there any way to do this?
>
> --Hrishi
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is the
> property of Persistent Systems Ltd. It is intended only for the use of the
> individual or entity to which it is addressed. If you are not the intended
> recipient, you are not authorized to read, retain, copy, print, distribute
> or use this message. If you have received this communication in error,
> please notify the sender and delete all copies of this message. Persistent
> Systems Ltd. does not accept any liability for virus infected mails.
>

-- 
从我的移动设备发送

－－－－－
天天开心
身体健康

Re: How to call method after all map jobs on slaves nodes are done

Posted by Mike Kendall <mk...@justin.tv>.

I don't know anything about MapRunnable but this would be pretty easy to do
with a bash script.  All you do it list out your bash commands in a text
file and run that text file...

http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html

It sounds like you're going to want to do something like...

#!/bin/bash

hadoop jar mapjobs
yourProcessor tempDir outDir
hadoop dfs -copyFromLocal outDir somewhereOnDfs

How does that sound?

-mike

(PS 10 nodes for 10gb jobs is way overkill..  I have 4 nodes for TBs of
data.)

On Fri, Nov 13, 2009 at 8:43 AM, Hrishikesh Agashe <
hrishikesh_agashe@persistent.co.in> wrote:

> Hi,
>
> I am implementing the MapRunnable interface to create the Map jobs.
> I have large data set for processing. (Data size is around 10 GB).
>
> I have 1 master and 10 slaves cluster.
> When I run my program, hadoop will process data successfully.
> After processing, I am collecting all data (all are files) in hadoop
> temporary directory.
>
> Now my requirement is when all maps are completed on each node I want to
> call one method which will process the data from temporary directory and
> finally copy those files on HDFS.
>
> Is there any way to do this?
>
> --Hrishi
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>