You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Aishwarya Venkataraman <av...@cs.ucsd.edu> on 2012/05/17 23:06:55 UTC

dfs.replication factor for MR jobs

Hello,

I have a 4-node cluster. One namenode and 3 other datanodes. I want to
explicitly set the dfs.replication factor to 1 inorder to run some
experiments. I tried setting this via the hdfs-site.xml file and via
the command line as well (hadoop dfs -setrep -R -w 1 /). But I have a
feeling that the replication factor that hdfs is seeing is 3. It seems
to be writing the temporary mapper outputs to all the 3 datanodes. Is
this the default configuration for MR jobs ? If no, how can I set this
to 1 ?

Thanks,
Aishwarya

Re: dfs.replication factor for MR jobs

Posted by Aishwarya Venkataraman <av...@cs.ucsd.edu>.
Apologies this works now if I set the dfs.replication=1 when I launch
the job i.e.

hadoop jar foo.jar com.foo -D dfs.replication=1 input output

On Thu, May 17, 2012 at 2:06 PM, Aishwarya Venkataraman
<av...@cs.ucsd.edu> wrote:
> Hello,
>
> I have a 4-node cluster. One namenode and 3 other datanodes. I want to
> explicitly set the dfs.replication factor to 1 inorder to run some
> experiments. I tried setting this via the hdfs-site.xml file and via
> the command line as well (hadoop dfs -setrep -R -w 1 /). But I have a
> feeling that the replication factor that hdfs is seeing is 3. It seems
> to be writing the temporary mapper outputs to all the 3 datanodes. Is
> this the default configuration for MR jobs ? If no, how can I set this
> to 1 ?
>
> Thanks,
> Aishwarya



-- 
Thanks,
Aishwarya Venkataraman
avenkata[at]cs[dot]ucsd[dot]edu
Graduate Student | Department of Computer Science
University of California, San Diego

Re: dfs.replication factor for MR jobs

Posted by Aishwarya Venkataraman <av...@cs.ucsd.edu>.
The MR job that Im running has zero reducers (Sorry I should have
mentioned this earlier). Its a mapper only job.

Thanks,


On Thu, May 17, 2012 at 2:31 PM, Abhishek Pratap Singh
<ma...@gmail.com> wrote:
> Hi Aishwarya,
>
> Temporary output of mapper is used for reducer. And number of Reduce jobs
> are based on the output keys of Mapper. It has nothing to do with
> replication factor.  It is writing to three nodes because at least three
> keys has been generated from mapper and assigned reducer to three different
> nodes.
>
> Regards,
> Abhishek
>
> On Thu, May 17, 2012 at 2:06 PM, Aishwarya Venkataraman <
> avenkata@cs.ucsd.edu> wrote:
>
>> Hello,
>>
>> I have a 4-node cluster. One namenode and 3 other datanodes. I want to
>> explicitly set the dfs.replication factor to 1 inorder to run some
>> experiments. I tried setting this via the hdfs-site.xml file and via
>> the command line as well (hadoop dfs -setrep -R -w 1 /). But I have a
>> feeling that the replication factor that hdfs is seeing is 3. It seems
>> to be writing the temporary mapper outputs to all the 3 datanodes. Is
>> this the default configuration for MR jobs ? If no, how can I set this
>> to 1 ?
>>
>> Thanks,
>> Aishwarya
>>



-- 
Thanks,
Aishwarya Venkataraman
avenkata[at]cs[dot]ucsd[dot]edu
Graduate Student | Department of Computer Science
University of California, San Diego

Re: dfs.replication factor for MR jobs

Posted by Abhishek Pratap Singh <ma...@gmail.com>.
Hi Aishwarya,

Temporary output of mapper is used for reducer. And number of Reduce jobs
are based on the output keys of Mapper. It has nothing to do with
replication factor.  It is writing to three nodes because at least three
keys has been generated from mapper and assigned reducer to three different
nodes.

Regards,
Abhishek

On Thu, May 17, 2012 at 2:06 PM, Aishwarya Venkataraman <
avenkata@cs.ucsd.edu> wrote:

> Hello,
>
> I have a 4-node cluster. One namenode and 3 other datanodes. I want to
> explicitly set the dfs.replication factor to 1 inorder to run some
> experiments. I tried setting this via the hdfs-site.xml file and via
> the command line as well (hadoop dfs -setrep -R -w 1 /). But I have a
> feeling that the replication factor that hdfs is seeing is 3. It seems
> to be writing the temporary mapper outputs to all the 3 datanodes. Is
> this the default configuration for MR jobs ? If no, how can I set this
> to 1 ?
>
> Thanks,
> Aishwarya
>