You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Nathan Marz <na...@rapleaf.com> on 2008/11/04 20:46:26 UTC

_temporary directories not deleted

Hello all,

Occasionally when running jobs, Hadoop fails to clean up the  
"_temporary" directories it has left behind. This only appears to  
happen when a task is killed (aka a speculative execution), and the  
data that task has outputted so far is not cleaned up. Is this a known  
issue in hadoop? Is the data from that task guaranteed to be duplicate  
data of what was outputted by another task? Is it safe to just delete  
this directory without worrying about losing data?

Thanks,
Nathan Marz
Rapleaf

Re: _temporary directories not deleted

Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.
Nathan Marz wrote:
> Hello all,
>
> Occasionally when running jobs, Hadoop fails to clean up the 
> "_temporary" directories it has left behind. This only appears to 
> happen when a task is killed (aka a speculative execution), and the 
> data that task has outputted so far is not cleaned up. Is this a known 
> issue in hadoop? 
Yes. It is possible that _temporary gets created by a speculative, after 
the cleanup in some corner cases.
> Is the data from that task guaranteed to be duplicate data of what was 
> outputted by another task? Is it safe to just delete this directory 
> without worrying about losing data?
>
Yes. You are right. It is duplicate data created by the speculative 
task. You can go ahead and delete it.
-Amareshwari
> Thanks,
> Nathan Marz
> Rapleaf