You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Ted Dunning <td...@maprtech.com> on 2013/01/01 00:51:06 UTC

Re: Does mapred.local.dir is important factor in reducer side?

Good point Harsh.  

Of course you still need to have a distributed file system that can match local file system performance and handle really big flash loads.  That isn't trivial to find.  

Sent from my iPhone

On Dec 31, 2012, at 11:28 AM, Harsh J <ha...@cloudera.com> wrote:

> Ted,
> 
> Do note that the local directory configs accept URIs in 2.x releases,
> allowing users to plug alternative filesystems if they wanted to.
> 
> On Tue, Jan 1, 2013 at 12:47 AM, Ted Dunning <td...@maprtech.com> wrote:
>> Hadoop, The Definitive Guide is only talking about Apache, CDH and
>> Hortonworks here.
>> 
>> The MapR distribution does not have this limitation and thus is one solution
>> for this problem.
>> 
>> Another solution is to do partial aggregates such as with a combiner.
>> 
>> 
>> On Mon, Dec 31, 2012 at 8:14 AM, Majid Azimi <ma...@gmail.com>
>> wrote:
>>> 
>>> Hadoop the definitive guide says:
>>> intermediate results on the mapper side is written to local disk at
>>> mapred.local.dir location so if this location does not have enough space the
>>> map will fail.
>>> 
>>> I want to know if this is true on the reducer side. Output of all mappers
>>> will merge at reducer side. In which location this merge happens? If that
>>> location does not have enough space does reducer fail? What is the solution
>>> for MapReduce jobs if intermediat results for some keys is more than local
>>> disk of reducer?
>> 
>> 
> 
> 
> 
> -- 
> Harsh J