You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Elton Pinto <ep...@gmail.com> on 2010/09/09 20:42:14 UTC

TOTAL_LAUNCHED_MAPS Counter

Does anyone know the difference between the Hadoop counter
TOTAL_LAUNCHED_MAPS and the "mapred.map.tasks" parameter available in the
JobConf?

We're seeing some situations where these two don't match up, and so we're
dropping data between jobs. We know we're dropping data because the bytes
written to HDFS in the first job doesn't match up with the number of bytes
read into the second job, and the number of input files is equivalent to
"mapred.map.tasks". And it is dropping legitimate data upon further analysis
(not duplicate data from speculative execution or anything like that -
speculative execution not likely to happen in these jobs to be honest though
because they're so fast).

Unfortunately, we run so many jobs that the JobTracker doesn't show us logs
older than maybe 20 minutes ago so it's really hard to catch this problem in
progress.

Thanks,

Elton

eptiger@gmail.com
epinto@alumni.cs.utexas.edu
http://www.eltonpinto.net/

Re: TOTAL_LAUNCHED_MAPS Counter

Posted by Elton Pinto <ep...@gmail.com>.

Just a follow-up: you were right. It was the speculative execution tasks. It
turns out that because we weren't using the OutputCollector we had a race
condition under speculative execution. We are going to re-factor to use the
OutputCollector, but in the meantime we just turned off speculative
execution.

Thanks,

Elton

On Thu, Sep 9, 2010 at 11:56 AM, Allen Wittenauer
<aw...@linkedin.com>wrote:

>
> On Sep 9, 2010, at 11:42 AM, Elton Pinto wrote:
>
> > Does anyone know the difference between the Hadoop counter
> TOTAL_LAUNCHED_MAPS and the "mapred.map.tasks" parameter available in the
> JobConf?
>
> mapred.map.tasks is what Hadoop thinks you need at a minimum.
>
> TOTAL_LAUNCHED_MAPS will be all map task attempts, including speculative
> execution and task recovery.
>
>
> > We're seeing some situations where these two don't match up, and so we're
> dropping data between jobs.
>
> ... which given the above is fairly normal.
>
> > We know we're dropping data because the bytes written to HDFS in the
> first job doesn't match up with the number of bytes read into the second
> job, and the number of input files is equivalent to "mapred.map.tasks".
>
> I'm fairly certain that the byte counters include all written bytes,
> including data that is essentially thrown away due to the above.
>
> > And it is dropping legitimate data upon further analysis (not duplicate
> data from speculative execution or anything like that - speculative
> execution not likely to happen in these jobs to be honest though because
> they're so fast).
>
> It doesn't matter how fast.  Depending upon which version of Hadoop, it may
> launch speculatives if there are task cycles.  For example, I'm looking at a
> job on our grid right now that has 300 map tasks that average 40 seconds.
>  It got 96 spec exec tasks to go with those 300, for a total of 396 map
> tasks.
>
> > Unfortunately, we run so many jobs that the JobTracker doesn't show us
> logs older than maybe 20 minutes ago so it's really hard to catch this
> problem in progress.
>
> All of the log data should still be on the job tracker.  You just can't use
> the GUI to see it. :)
>
>

Re: TOTAL_LAUNCHED_MAPS Counter

Posted by Allen Wittenauer <aw...@linkedin.com>.

On Sep 9, 2010, at 11:42 AM, Elton Pinto wrote:

> Does anyone know the difference between the Hadoop counter TOTAL_LAUNCHED_MAPS and the "mapred.map.tasks" parameter available in the JobConf? 

mapred.map.tasks is what Hadoop thinks you need at a minimum.

TOTAL_LAUNCHED_MAPS will be all map task attempts, including speculative execution and task recovery.

> We're seeing some situations where these two don't match up, and so we're dropping data between jobs.

... which given the above is fairly normal.

> We know we're dropping data because the bytes written to HDFS in the first job doesn't match up with the number of bytes read into the second job, and the number of input files is equivalent to "mapred.map.tasks".

I'm fairly certain that the byte counters include all written bytes, including data that is essentially thrown away due to the above.

> And it is dropping legitimate data upon further analysis (not duplicate data from speculative execution or anything like that - speculative execution not likely to happen in these jobs to be honest though because they're so fast). 

It doesn't matter how fast.  Depending upon which version of Hadoop, it may launch speculatives if there are task cycles.  For example, I'm looking at a job on our grid right now that has 300 map tasks that average 40 seconds.  It got 96 spec exec tasks to go with those 300, for a total of 396 map tasks.

> Unfortunately, we run so many jobs that the JobTracker doesn't show us logs older than maybe 20 minutes ago so it's really hard to catch this problem in progress.

All of the log data should still be on the job tracker.  You just can't use the GUI to see it. :)