You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-dev@hadoop.apache.org by Dharanikumar Bodla <dh...@coartha.com> on 2014/02/24 15:01:45 UTC

Reg:Mapreduce and Streaming Job with HDP2.0,Yarn and Mapreduce2

hi to all,
Good Morning,
I had a set of 378documents in the form of text files of size 200-500MB and loaded in hdfs,when running hadoop streaming map/reduce funtion from command line of hdfs ,it took 48mins 43sec for streaming the text files.How to increase the map/reduce process as fast as possible so that these text files should complete the process by 10-15 seconds.
What changes I need to do on hadoop 2.0 with mapreduce2 and yarn 
And having cores = 2
Allocated 2GB of data for Yarn,and 400GB for HDFS
default virtual memory for a job map-task = 1024MB
default virtual memory for a job reduce-task = 512MB
mapreduce.map.java.opt = -xmx512m
mapreduce.reduce.java.opt = -xmx256m
MAP side sort buffer memory = 256 MB
And using only yarn = 75% for this process

Thanks & regards,
Bodla Dharani Kumar,