You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Aaron Kimball <aa...@cloudera.com> on 2009/01/23 10:29:27 UTC

_temporary directory getting deleted mid-job?

I saw some puzzling behavior tonight when running a MapReduce program I
wrote.

It would perform the mapping just fine, and would begin to shuffle. It got
to 33% complete reduce (end of shuffle) and then the task fails, claiming
that <output_dir>/_temporary was deleted.

I didn't touch HDFS while this was going on.

I tried running the job multiple more times, and this repeated twice more.
Puzzlingly, I was doing bin/hadoop fs -ls <output_dir> periodically in
another window. The _temporary directory got created just fine, but at some
point after shuffling began, it was removed.

I tried to see if I could manually race this, so I did a mkdir _temporary,
and the job proceeded just fine. Even more bizarre, the removal of the
_temporary directory did not occur on any subsequent MR jobs (executions of
the same, unmodified program). So I can't reproduce the bug.

This is on 0.18.2.

It went away, so I'm not *too* concerned, but I'd rather not deal with
heisenbugs if at all possible

So: has anyone seen this behavior? Have you figured out how to reproduce it,
or even better, prevent it?

Thanks,
- Aaron