You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Libo Yu <yu...@hotmail.com> on 2014/05/09 03:17:35 UTC

spilled records

Hi, 

According to ""Hadoop: the definitive guide", when mapreduce.job.shuffle.input.buffer.percent is 
large enough, the map outputs are copied directly into the reduce JVM memory.

I set this parameter to 0.5 which is large enough to hold map outputs, but #spilled records is still the same 
as reduce input records.  Anybody knows why? Thanks.

Libo


 		 	   		  

RE: spilled records

Posted by java8964 <ja...@hotmail.com>.
Your first understanding is not correct. Where do you get that interruption from the book?
About the #spilled records, every record of output of mapper will be spilled at least one time.So in ideal scenario, these 2 numbers should be equal. If they are not, and spilled number is much larger than the records count of output of mappers, then you maybe need to adjust "io.sort.mb" configuration.
Yong 

From: yu_libo@hotmail.com
To: user@hadoop.apache.org
Subject: spilled records
Date: Thu, 8 May 2014 21:17:35 -0400




Hi, 

According to ""Hadoop: the definitive guide", when mapreduce.job.shuffle.input.buffer.percent is 
large enough, the map outputs are copied directly into the reduce JVM memory.

I set this parameter to 0.5 which is large enough to hold map outputs, but #spilled records is still the same 
as reduce input records.  Anybody knows why? Thanks.

Libo


 		 	   		   		 	   		  

RE: spilled records

Posted by java8964 <ja...@hotmail.com>.
Your first understanding is not correct. Where do you get that interruption from the book?
About the #spilled records, every record of output of mapper will be spilled at least one time.So in ideal scenario, these 2 numbers should be equal. If they are not, and spilled number is much larger than the records count of output of mappers, then you maybe need to adjust "io.sort.mb" configuration.
Yong 

From: yu_libo@hotmail.com
To: user@hadoop.apache.org
Subject: spilled records
Date: Thu, 8 May 2014 21:17:35 -0400




Hi, 

According to ""Hadoop: the definitive guide", when mapreduce.job.shuffle.input.buffer.percent is 
large enough, the map outputs are copied directly into the reduce JVM memory.

I set this parameter to 0.5 which is large enough to hold map outputs, but #spilled records is still the same 
as reduce input records.  Anybody knows why? Thanks.

Libo


 		 	   		   		 	   		  

RE: spilled records

Posted by java8964 <ja...@hotmail.com>.
Your first understanding is not correct. Where do you get that interruption from the book?
About the #spilled records, every record of output of mapper will be spilled at least one time.So in ideal scenario, these 2 numbers should be equal. If they are not, and spilled number is much larger than the records count of output of mappers, then you maybe need to adjust "io.sort.mb" configuration.
Yong 

From: yu_libo@hotmail.com
To: user@hadoop.apache.org
Subject: spilled records
Date: Thu, 8 May 2014 21:17:35 -0400




Hi, 

According to ""Hadoop: the definitive guide", when mapreduce.job.shuffle.input.buffer.percent is 
large enough, the map outputs are copied directly into the reduce JVM memory.

I set this parameter to 0.5 which is large enough to hold map outputs, but #spilled records is still the same 
as reduce input records.  Anybody knows why? Thanks.

Libo


 		 	   		   		 	   		  

RE: spilled records

Posted by java8964 <ja...@hotmail.com>.
Your first understanding is not correct. Where do you get that interruption from the book?
About the #spilled records, every record of output of mapper will be spilled at least one time.So in ideal scenario, these 2 numbers should be equal. If they are not, and spilled number is much larger than the records count of output of mappers, then you maybe need to adjust "io.sort.mb" configuration.
Yong 

From: yu_libo@hotmail.com
To: user@hadoop.apache.org
Subject: spilled records
Date: Thu, 8 May 2014 21:17:35 -0400




Hi, 

According to ""Hadoop: the definitive guide", when mapreduce.job.shuffle.input.buffer.percent is 
large enough, the map outputs are copied directly into the reduce JVM memory.

I set this parameter to 0.5 which is large enough to hold map outputs, but #spilled records is still the same 
as reduce input records.  Anybody knows why? Thanks.

Libo