You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Nathan Marz <na...@rapleaf.com> on 2008/11/04 20:46:26 UTC
_temporary directories not deleted
Hello all,
Occasionally when running jobs, Hadoop fails to clean up the
"_temporary" directories it has left behind. This only appears to
happen when a task is killed (aka a speculative execution), and the
data that task has outputted so far is not cleaned up. Is this a known
issue in hadoop? Is the data from that task guaranteed to be duplicate
data of what was outputted by another task? Is it safe to just delete
this directory without worrying about losing data?
Thanks,
Nathan Marz
Rapleaf
Re: _temporary directories not deleted
Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.
Nathan Marz wrote:
> Hello all,
>
> Occasionally when running jobs, Hadoop fails to clean up the
> "_temporary" directories it has left behind. This only appears to
> happen when a task is killed (aka a speculative execution), and the
> data that task has outputted so far is not cleaned up. Is this a known
> issue in hadoop?
Yes. It is possible that _temporary gets created by a speculative, after
the cleanup in some corner cases.
> Is the data from that task guaranteed to be duplicate data of what was
> outputted by another task? Is it safe to just delete this directory
> without worrying about losing data?
>
Yes. You are right. It is duplicate data created by the speculative
task. You can go ahead and delete it.
-Amareshwari
> Thanks,
> Nathan Marz
> Rapleaf