You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by sandeep das <ya...@gmail.com> on 2015/11/18 13:33:40 UTC

Data spilling on disk from MR jobs

Hi,

I'm running my pig script over YARN(MR2). I was going through some tuning
parameter and find out that the value of parameter
"mapreduce.task.io.sort.mb" should be tuned properly. By default it is
configured to 256 MB in my cloudera setup.

I would wish to know that how can I find whether my MR jobs are spilling
data into disk or not. Are there any logs which can help me to find how
much data was spilled over disk? Is there any parameter which can be
configured to enable such logging.

CDH: CDH-5.4.4-1.cdh5.4.4.p0.4
Hadoop: 2.6.0-cdh5.4.4

Let me know in case more information is required.


Regards,
Sandeep