You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by John Armstrong <jo...@ccri.com> on 2011/07/26 21:28:46 UTC

Adding files to map/reduce classpath

I'm back to trying to add libraries to the classpath instead of handing
around a fat JAR.  This time I've served up my directory full of JARs on
NFS, which each node in my cluster has mounted at /mnt/hadoop-libs.  Now my
question is how to add that (local) directory to the classpath of the
mapper and reducer tasks.  I've tried adding "-classpath
/mnt/hadoop-libs/*" to mapred.map.child.java.opts, but it doesn't seem to
work; the actual classpath I can see being called is just the local
/usr/lib/hadoop/lib stuff as usual.

Re: Adding files to map/reduce classpath

Posted by John Armstrong <jo...@ccri.com>.
On Tue, 26 Jul 2011 12:35:48 -0700, Shrijeet Paliwal
<sh...@rocketfuel.com> wrote:
> **
> See if this (very old) reply from Mikhail helps.
> http://search-hadoop.com/m/QFVD1kEmQT
> Here is the patch he is referring to.
>
http://m1.archiveorange.com/m/att/RNVYm/ArchiveOrange_8dEcdJI4bXFkKHBnsll8YzTc8u8a.patch
> 
> **replying in hurry


Thanks; it looks like that would work, but it's a gamble whether the
client will be willing to install that patch.  Do you know if it's been
added in CDH3-beta-3?

Re: Merge Reducers Outputs

Posted by David Rosenstrauch <da...@darose.net>.
On 07/26/2011 06:52 PM, Mohamed Riadh Trad wrote:
> Dear All,
>
> Is it possible to set up a task with multiple reducers and merge reducers outputs into one single file?
>
> Bests,
>
> Trad Mohamed Riadh, M.Sc, Ing.

Not within the map-reduce job, but you can merge it after the job is 
done.  At my previous company we used FileUtil.copyMerge() to do this, 
and it worked quite well.

See:

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/fs/FileUtil.html#copyMerge%28org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path,%20org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path,%20boolean,%20org.apache.hadoop.conf.Configuration,%20java.lang.String%29

DR

Re: Merge Reducers Outputs

Posted by Arun C Murthy <ac...@hortonworks.com>.
No, you either have small enough data that you can have all go to a single reducer or you can setup a (sampling) partitioner so that the partitions are sorted and you can get globally sorted output from multiple reduces - take a look at the TeraSort example for this.

Arun

On Jul 26, 2011, at 3:52 PM, Mohamed Riadh Trad wrote:

> Dear All,
> 
> Is it possible to set up a task with multiple reducers and merge reducers outputs into one single file?
> 
> Bests,
> 
> Trad Mohamed Riadh, M.Sc, Ing.
> PhD. student
> INRIA-TELECOM PARISTECH - ENPC School of International Management
> 
> Office: 11-15
> Phone: (33)-1 39 63 59 33
> Fax: (33)-1 39 63 56 74
> Email: riadh.trad@inria.fr
> Home page: http://www-rocq.inria.fr/who/Mohamed.Trad/


Merge Reducers Outputs

Posted by Mohamed Riadh Trad <Mo...@inria.fr>.
Dear All,

Is it possible to set up a task with multiple reducers and merge reducers outputs into one single file?

Bests,

Trad Mohamed Riadh, M.Sc, Ing.
PhD. student
INRIA-TELECOM PARISTECH - ENPC School of International Management

Office: 11-15
Phone: (33)-1 39 63 59 33
Fax: (33)-1 39 63 56 74
Email: riadh.trad@inria.fr
Home page: http://www-rocq.inria.fr/who/Mohamed.Trad/

Re: Adding files to map/reduce classpath

Posted by Shrijeet Paliwal <sh...@rocketfuel.com>.
**
See if this (very old) reply from Mikhail helps.
http://search-hadoop.com/m/QFVD1kEmQT
Here is the patch he is referring to.
http://m1.archiveorange.com/m/att/RNVYm/ArchiveOrange_8dEcdJI4bXFkKHBnsll8YzTc8u8a.patch

**replying in hurry

On Tue, Jul 26, 2011 at 12:28 PM, John Armstrong
<jo...@ccri.com> wrote:
> I'm back to trying to add libraries to the classpath instead of handing
> around a fat JAR.  This time I've served up my directory full of JARs on
> NFS, which each node in my cluster has mounted at /mnt/hadoop-libs.  Now my
> question is how to add that (local) directory to the classpath of the
> mapper and reducer tasks.  I've tried adding "-classpath
> /mnt/hadoop-libs/*" to mapred.map.child.java.opts, but it doesn't seem to
> work; the actual classpath I can see being called is just the local
> /usr/lib/hadoop/lib stuff as usual.
>