You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by himanshu chandola <hi...@yahoo.com> on 2010/01/02 00:02:57 UTC

Re: large reducer output with same key

Thanks.

This is probably something trivial but if you would've any idea what could be causing this, it would be helpful. I replaced the mapred.local.dir to drives which have bigger capacity. The map jobs start to fail with the following message:
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_200912311931_0002/attempt_200912311931_0002_m_000027_0/output/file.out.index in any of the configured local directories


This is weird because the file in question exists on that machine in that directory (taskTracker/jobcache....). The permissions are also right so I haven't been able to realize what could be the problem.

Do you have any ideas on this ?

Thanks


 Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.



----- Original Message ----
From: Jason Venner <ja...@gmail.com>
To: common-user@hadoop.apache.org
Sent: Thu, December 31, 2009 1:46:47 PM
Subject: Re: large reducer output with same key

the mapred.local.dir paramter will be used by each tasktracker node to
fprovide directory(ies) to store transitory data about the tasks the
tasktracker runs.
This includes the map output, and can be very large.

On Thu, Dec 31, 2009 at 10:03 AM, himanshu chandola <
himanshu_coolguy@yahoo.com> wrote:

> Hi Todd,
> Are these directories supposed to be on the namenode or on each of the
> datanodes ? In my case it is  set to a directory inside /tmp but the
> mapred.local.dir was present only on the namenode.
>
> Thanks for the help
>
> Himanshu
>
> Morpheus: Do you believe in fate, Neo?
> Neo: No.
> Morpheus: Why Not?
> Neo: Because I don't like the idea that I'm not in control of my life.
>
>
>
> ----- Original Message ----
> From: Todd Lipcon <to...@cloudera.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, December 31, 2009 10:17:05 AM
> Subject: Re: large reducer output with same key
>
> Hi Himanshu,
>
> Sounds like your mapred.local.dir doesn't have enough space. My guess is
> that you've configured it somewhere inside /tmp/. Instead you should spread
> it across all of your local physical disks by comma-separating the
> directories in the configuration. Something like:
>
> <property>
>  <name>mapred.local.dir</name>
>  <value>/disk1/mapred-local,/disk2/mapred-local,/disk3/mapred-local</value>
> </property>
>
> (and of course make sure those directories exist and are writable by the
> user that runs your hadoop daemons, often "hadoop")
>
> Thanks
> -Todd
>
> On Thu, Dec 31, 2009 at 2:10 AM, himanshu chandola <
> himanshu_coolguy@yahoo.com> wrote:
>
> > Hi Everyone,
> > My reducer output results in most of the data having the same key. The
> > reducer output is close to 16 GB and though my cluster in total has a
> > terabyte of space in hdfs I get errors like the following :
> >
> > >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:719)
> > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209)
> > >         at
> > > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
> > > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException:
> > > Could not find any valid local directory for
> > > task_200808021906_0002_m_000014_2/spill4.out
> >
> > After such failures, hadoop tries to start the same reduce job couple
> times
> > on other nodes before the job fails. From the
> > exception, it looks to me this is
> > probably a disk error(some machines have less than 16 gigs free space on
> > hdfs).
> >
> > So my question was whether hadoop puts values which share the same key as
> a
> > single block in one node ? Or something else
> > could be happening here ?
> >
> > Thanks
> >
> > H
> >
> >
> >
> >
>
>
>
>


-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: large reducer output with same key

Posted by himanshu chandola <hi...@yahoo.com>.

It is in the tasktracker log. The job is same as before so definitely the machine is not heavily loaded. 

Seems pretty weird that the data is written at the right mapred.local.dir but not read from there.

 Morpheus: Do you believe in fate, Neo?
Neo: No.
Morpheus: Why Not?
Neo: Because I don't like the idea that I'm not in control of my life.



----- Original Message ----
From: Jason Venner <ja...@gmail.com>
To: common-user@hadoop.apache.org
Sent: Sat, January 2, 2010 1:37:07 PM
Subject: Re: large reducer output with same key

I havve only seen that type of error when the tasktracker machine is very
heavily loaded and the task does not exit in a timely manner after the
tasktracker terminates it.

Is this error in your task log or in the tasktracker log?

On Fri, Jan 1, 2010 at 3:02 PM, himanshu chandola <
himanshu_coolguy@yahoo.com> wrote:

> Thanks.
>
> This is probably something trivial but if you would've any idea what could
> be causing this, it would be helpful. I replaced the mapred.local.dir to
> drives which have bigger capacity. The map jobs start to fail with the
> following message:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> taskTracker/jobcache/job_200912311931_0002/attempt_200912311931_0002_m_000027_0/output/file.out.index
> in any of the configured local directories
>
>
> This is weird because the file in question exists on that machine in that
> directory (taskTracker/jobcache....). The permissions are also right so I
> haven't been able to realize what could be the problem.
>
> Do you have any ideas on this ?
>
> Thanks
>
>
>  Morpheus: Do you believe in fate, Neo?
> Neo: No.
> Morpheus: Why Not?
> Neo: Because I don't like the idea that I'm not in control of my life.
>
>
>
> ----- Original Message ----
> From: Jason Venner <ja...@gmail.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, December 31, 2009 1:46:47 PM
> Subject: Re: large reducer output with same key
>
> the mapred.local.dir paramter will be used by each tasktracker node to
> fprovide directory(ies) to store transitory data about the tasks the
> tasktracker runs.
> This includes the map output, and can be very large.
>
> On Thu, Dec 31, 2009 at 10:03 AM, himanshu chandola <
> himanshu_coolguy@yahoo.com> wrote:
>
> > Hi Todd,
> > Are these directories supposed to be on the namenode or on each of the
> > datanodes ? In my case it is  set to a directory inside /tmp but the
> > mapred.local.dir was present only on the namenode.
> >
> > Thanks for the help
> >
> > Himanshu
> >
> > Morpheus: Do you believe in fate, Neo?
> > Neo: No.
> > Morpheus: Why Not?
> > Neo: Because I don't like the idea that I'm not in control of my life.
> >
> >
> >
> > ----- Original Message ----
> > From: Todd Lipcon <to...@cloudera.com>
> > To: common-user@hadoop.apache.org
> > Sent: Thu, December 31, 2009 10:17:05 AM
> > Subject: Re: large reducer output with same key
> >
> > Hi Himanshu,
> >
> > Sounds like your mapred.local.dir doesn't have enough space. My guess is
> > that you've configured it somewhere inside /tmp/. Instead you should
> spread
> > it across all of your local physical disks by comma-separating the
> > directories in the configuration. Something like:
> >
> > <property>
> >  <name>mapred.local.dir</name>
> >
>  <value>/disk1/mapred-local,/disk2/mapred-local,/disk3/mapred-local</value>
> > </property>
> >
> > (and of course make sure those directories exist and are writable by the
> > user that runs your hadoop daemons, often "hadoop")
> >
> > Thanks
> > -Todd
> >
> > On Thu, Dec 31, 2009 at 2:10 AM, himanshu chandola <
> > himanshu_coolguy@yahoo.com> wrote:
> >
> > > Hi Everyone,
> > > My reducer output results in most of the data having the same key. The
> > > reducer output is close to 16 GB and though my cluster in total has a
> > > terabyte of space in hdfs I get errors like the following :
> > >
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:719)
> > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209)
> > > >         at
> > > >
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
> > > > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException:
> > > > Could not find any valid local directory for
> > > > task_200808021906_0002_m_000014_2/spill4.out
> > >
> > > After such failures, hadoop tries to start the same reduce job couple
> > times
> > > on other nodes before the job fails. From the
> > > exception, it looks to me this is
> > > probably a disk error(some machines have less than 16 gigs free space
> on
> > > hdfs).
> > >
> > > So my question was whether hadoop puts values which share the same key
> as
> > a
> > > single block in one node ? Or something else
> > > could be happening here ?
> > >
> > > Thanks
> > >
> > > H
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>
>
>
>
>


-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Re: large reducer output with same key

Posted by Jason Venner <ja...@gmail.com>.

I havve only seen that type of error when the tasktracker machine is very
heavily loaded and the task does not exit in a timely manner after the
tasktracker terminates it.

Is this error in your task log or in the tasktracker log?

On Fri, Jan 1, 2010 at 3:02 PM, himanshu chandola <
himanshu_coolguy@yahoo.com> wrote:

> Thanks.
>
> This is probably something trivial but if you would've any idea what could
> be causing this, it would be helpful. I replaced the mapred.local.dir to
> drives which have bigger capacity. The map jobs start to fail with the
> following message:
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find
> taskTracker/jobcache/job_200912311931_0002/attempt_200912311931_0002_m_000027_0/output/file.out.index
> in any of the configured local directories
>
>
> This is weird because the file in question exists on that machine in that
> directory (taskTracker/jobcache....). The permissions are also right so I
> haven't been able to realize what could be the problem.
>
> Do you have any ideas on this ?
>
> Thanks
>
>
>  Morpheus: Do you believe in fate, Neo?
> Neo: No.
> Morpheus: Why Not?
> Neo: Because I don't like the idea that I'm not in control of my life.
>
>
>
> ----- Original Message ----
> From: Jason Venner <ja...@gmail.com>
> To: common-user@hadoop.apache.org
> Sent: Thu, December 31, 2009 1:46:47 PM
> Subject: Re: large reducer output with same key
>
> the mapred.local.dir paramter will be used by each tasktracker node to
> fprovide directory(ies) to store transitory data about the tasks the
> tasktracker runs.
> This includes the map output, and can be very large.
>
> On Thu, Dec 31, 2009 at 10:03 AM, himanshu chandola <
> himanshu_coolguy@yahoo.com> wrote:
>
> > Hi Todd,
> > Are these directories supposed to be on the namenode or on each of the
> > datanodes ? In my case it is  set to a directory inside /tmp but the
> > mapred.local.dir was present only on the namenode.
> >
> > Thanks for the help
> >
> > Himanshu
> >
> > Morpheus: Do you believe in fate, Neo?
> > Neo: No.
> > Morpheus: Why Not?
> > Neo: Because I don't like the idea that I'm not in control of my life.
> >
> >
> >
> > ----- Original Message ----
> > From: Todd Lipcon <to...@cloudera.com>
> > To: common-user@hadoop.apache.org
> > Sent: Thu, December 31, 2009 10:17:05 AM
> > Subject: Re: large reducer output with same key
> >
> > Hi Himanshu,
> >
> > Sounds like your mapred.local.dir doesn't have enough space. My guess is
> > that you've configured it somewhere inside /tmp/. Instead you should
> spread
> > it across all of your local physical disks by comma-separating the
> > directories in the configuration. Something like:
> >
> > <property>
> >  <name>mapred.local.dir</name>
> >
>  <value>/disk1/mapred-local,/disk2/mapred-local,/disk3/mapred-local</value>
> > </property>
> >
> > (and of course make sure those directories exist and are writable by the
> > user that runs your hadoop daemons, often "hadoop")
> >
> > Thanks
> > -Todd
> >
> > On Thu, Dec 31, 2009 at 2:10 AM, himanshu chandola <
> > himanshu_coolguy@yahoo.com> wrote:
> >
> > > Hi Everyone,
> > > My reducer output results in most of the data having the same key. The
> > > reducer output is close to 16 GB and though my cluster in total has a
> > > terabyte of space in hdfs I get errors like the following :
> > >
> > > >
> > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:719)
> > > >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:209)
> > > >         at
> > > >
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2084)
> > > > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException:
> > > > Could not find any valid local directory for
> > > > task_200808021906_0002_m_000014_2/spill4.out
> > >
> > > After such failures, hadoop tries to start the same reduce job couple
> > times
> > > on other nodes before the job fails. From the
> > > exception, it looks to me this is
> > > probably a disk error(some machines have less than 16 gigs free space
> on
> > > hdfs).
> > >
> > > So my question was whether hadoop puts values which share the same key
> as
> > a
> > > single block in one node ? Or something else
> > > could be happening here ?
> > >
> > > Thanks
> > >
> > > H
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>
>
>
>
>


-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals