You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Sandhya E <sa...@gmail.com> on 2009/04/28 09:01:29 UTC

intermediate files of killed tasks not purged

Hi

Under <hadoop-tmp-dir>/mapred/local there are directories like
"attempt_200904262046_0026_m_000002_0"
Each of these directories contains files of format: intermediate.1
intermediate.2  intermediate.3  intermediate.4  intermediate.5
There are many directories in this format. All these correspond to
killed task attempts. As they contain huge intermediate files, we
landed up in disk space issues.

They are cleaned up  when mapred cluster is restarted. But otherwise,
how can these be cleaned up without having to restart cluster.

Conf parameter "keep.failed.task.files" is set to "false" in our case.

Many Thanks
Sandhya

Re: intermediate files of killed tasks not purged

Posted by Sandhya E <sa...@gmail.com>.

Attempt directories are in <hadoop-tmp>/mapred/local

I grep'd for one of the attempt that has left over in mapred/local in
tasktracker logs:
09/04/27 21:07:19 INFO mapred.TaskTracker: LaunchTaskAction:
attempt_200902120108_44218_r_000000_0
09/04/27 21:07:29 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:32 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:38 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:41 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:47 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:53 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:56 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:02 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:08 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:11 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:17 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:23 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:26 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:29 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:32 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:39 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:45 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:48 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:54 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:09:00 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:09:06 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.33333334% reduce > sort
09/04/27 21:09:09 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.33333334% reduce > sort
09/04/27 21:09:12 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.7029736% reduce > reduce
09/04/27 21:09:15 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.771893% reduce > reduce
09/04/27 21:09:18 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.8495109% reduce > reduce
09/04/27 21:09:21 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.9042134% reduce > reduce
09/04/27 21:09:24 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.98041093% reduce > reduce
09/04/27 21:09:26 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.99415195% reduce > reduce
09/04/27 21:09:26 INFO mapred.TaskTracker: Task
attempt_200902120108_44218_r_000000_0 is done.
09/04/27 21:09:31 INFO mapred.TaskRunner:
attempt_200902120108_44218_r_000000_0 done; removing files.

Regards
Sandhya

On Tue, Apr 28, 2009 at 2:39 PM, Amareshwari Sriramadasu
<am...@yahoo-inc.com> wrote:
> Again, where are you seeing the attemptid directories? are they at
> mapred/local/<attemptid> or at
> mapred/local/taskTracker/jobCache/<jobid>/<attempid>.
> If you are seeing files at mapred/local/<attemptid>, then it is bug. Please
> raise a jira and attach tasktracker logs if possible.
> If not, mapred/local/taskTracker/jobCache/<jobid>/<attempid> directories are
> cleaned up on a KillTaskAction and mapred/local/taskTracker/jobCache/<jobid>
> directories are cleanedup on KillJobAction. Can you verify from TaskTracker
> logs, the attemptid got a KillTaskAction or jobid got a KillJobAction? If
> not, This is fixed by HADOOP-5247.
>
> Thanks
> Amareshwari
>
> Sandhya E wrote:
>>
>> Hi Amareshwari
>>
>> We are on 0.18 version. I verified from jobtracker website that not
>> all killed tasks have left overs in mapred/local.  Also there are some
>> tasks that were successful have left their tmp folders in mapred/local
>>
>> Can you please give some pointers on how to debug it further.
>>
>> Regards
>> Sandhya
>>
>> On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu
>> <am...@yahoo-inc.com> wrote:
>>
>>>
>>> Hi Sandhya,
>>>
>>>  Which version of HADOOP are you using? There could be <attempt_id>
>>> directories in mapred/local, pre 0.17. Now, there should not be any such
>>> directories.
>>> From version 0.17 onwards, the attempt directories will be present only
>>> at
>>> mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are seeing
>>> the
>>> directories in any other location, then it seems like a bug.
>>>
>>> HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it does
>>> not change local FileSystem files.
>>>
>>> Thanks
>>> Amareshwari
>>> Edward J. Yoon wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.
>>>>
>>>> On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sa...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>>>
>>>>> Hi
>>>>>
>>>>> Under <hadoop-tmp-dir>/mapred/local there are directories like
>>>>> "attempt_200904262046_0026_m_000002_0"
>>>>> Each of these directories contains files of format: intermediate.1
>>>>> intermediate.2  intermediate.3  intermediate.4  intermediate.5
>>>>> There are many directories in this format. All these correspond to
>>>>> killed task attempts. As they contain huge intermediate files, we
>>>>> landed up in disk space issues.
>>>>>
>>>>> They are cleaned up  when mapred cluster is restarted. But otherwise,
>>>>> how can these be cleaned up without having to restart cluster.
>>>>>
>>>>> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>>>>>
>>>>> Many Thanks
>>>>> Sandhya
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>
>

Re: intermediate files of killed tasks not purged

Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.

Again, where are you seeing the attemptid directories? are they at 
mapred/local/<attemptid> or at 
mapred/local/taskTracker/jobCache/<jobid>/<attempid>.
If you are seeing files at mapred/local/<attemptid>, then it is bug. 
Please raise a jira and attach tasktracker logs if possible.
If not, mapred/local/taskTracker/jobCache/<jobid>/<attempid> directories 
are cleaned up on a KillTaskAction and 
mapred/local/taskTracker/jobCache/<jobid> directories are cleanedup on 
KillJobAction. Can you verify from TaskTracker logs, the attemptid got a 
KillTaskAction or jobid got a KillJobAction? If not, This is fixed by 
HADOOP-5247.

Thanks
Amareshwari

Sandhya E wrote:
> Hi Amareshwari
>
> We are on 0.18 version. I verified from jobtracker website that not
> all killed tasks have left overs in mapred/local.  Also there are some
> tasks that were successful have left their tmp folders in mapred/local
>
> Can you please give some pointers on how to debug it further.
>
> Regards
> Sandhya
>
> On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu
> <am...@yahoo-inc.com> wrote:
>   
>> Hi Sandhya,
>>
>>  Which version of HADOOP are you using? There could be <attempt_id>
>> directories in mapred/local, pre 0.17. Now, there should not be any such
>> directories.
>> From version 0.17 onwards, the attempt directories will be present only at
>> mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are seeing the
>> directories in any other location, then it seems like a bug.
>>
>> HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it does
>> not change local FileSystem files.
>>
>> Thanks
>> Amareshwari
>> Edward J. Yoon wrote:
>>     
>>> Hi,
>>>
>>> It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.
>>>
>>> On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sa...@gmail.com>
>>> wrote:
>>>
>>>       
>>>> Hi
>>>>
>>>> Under <hadoop-tmp-dir>/mapred/local there are directories like
>>>> "attempt_200904262046_0026_m_000002_0"
>>>> Each of these directories contains files of format: intermediate.1
>>>> intermediate.2  intermediate.3  intermediate.4  intermediate.5
>>>> There are many directories in this format. All these correspond to
>>>> killed task attempts. As they contain huge intermediate files, we
>>>> landed up in disk space issues.
>>>>
>>>> They are cleaned up  when mapred cluster is restarted. But otherwise,
>>>> how can these be cleaned up without having to restart cluster.
>>>>
>>>> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>>>>
>>>> Many Thanks
>>>> Sandhya
>>>>
>>>>
>>>>         
>>>
>>>
>>>       
>>

Re: intermediate files of killed tasks not purged

Posted by Sandhya E <sa...@gmail.com>.

Hi Amareshwari

We are on 0.18 version. I verified from jobtracker website that not
all killed tasks have left overs in mapred/local.  Also there are some
tasks that were successful have left their tmp folders in mapred/local

Can you please give some pointers on how to debug it further.

Regards
Sandhya

On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu
<am...@yahoo-inc.com> wrote:
> Hi Sandhya,
>
>  Which version of HADOOP are you using? There could be <attempt_id>
> directories in mapred/local, pre 0.17. Now, there should not be any such
> directories.
> From version 0.17 onwards, the attempt directories will be present only at
> mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are seeing the
> directories in any other location, then it seems like a bug.
>
> HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it does
> not change local FileSystem files.
>
> Thanks
> Amareshwari
> Edward J. Yoon wrote:
>>
>> Hi,
>>
>> It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.
>>
>> On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sa...@gmail.com>
>> wrote:
>>
>>>
>>> Hi
>>>
>>> Under <hadoop-tmp-dir>/mapred/local there are directories like
>>> "attempt_200904262046_0026_m_000002_0"
>>> Each of these directories contains files of format: intermediate.1
>>> intermediate.2  intermediate.3  intermediate.4  intermediate.5
>>> There are many directories in this format. All these correspond to
>>> killed task attempts. As they contain huge intermediate files, we
>>> landed up in disk space issues.
>>>
>>> They are cleaned up  when mapred cluster is restarted. But otherwise,
>>> how can these be cleaned up without having to restart cluster.
>>>
>>> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>>>
>>> Many Thanks
>>> Sandhya
>>>
>>>
>>
>>
>>
>>
>
>

Re: intermediate files of killed tasks not purged

Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.

Hi Sandhya,

  Which version of HADOOP are you using? There could be <attempt_id> 
directories in mapred/local, pre 0.17. Now, there should not be any such 
directories.
 From version 0.17 onwards, the attempt directories will be present only 
at mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are 
seeing the directories in any other location, then it seems like a bug.

HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it 
does not change local FileSystem files.

Thanks
Amareshwari
Edward J. Yoon wrote:
> Hi,
>
> It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.
>
> On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sa...@gmail.com> wrote:
>   
>> Hi
>>
>> Under <hadoop-tmp-dir>/mapred/local there are directories like
>> "attempt_200904262046_0026_m_000002_0"
>> Each of these directories contains files of format: intermediate.1
>> intermediate.2  intermediate.3  intermediate.4  intermediate.5
>> There are many directories in this format. All these correspond to
>> killed task attempts. As they contain huge intermediate files, we
>> landed up in disk space issues.
>>
>> They are cleaned up  when mapred cluster is restarted. But otherwise,
>> how can these be cleaned up without having to restart cluster.
>>
>> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>>
>> Many Thanks
>> Sandhya
>>
>>     
>
>
>
>

Re: intermediate files of killed tasks not purged

Posted by "Edward J. Yoon" <ed...@apache.org>.

Hi,

It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.

On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sa...@gmail.com> wrote:
> Hi
>
> Under <hadoop-tmp-dir>/mapred/local there are directories like
> "attempt_200904262046_0026_m_000002_0"
> Each of these directories contains files of format: intermediate.1
> intermediate.2  intermediate.3  intermediate.4  intermediate.5
> There are many directories in this format. All these correspond to
> killed task attempts. As they contain huge intermediate files, we
> landed up in disk space issues.
>
> They are cleaned up  when mapred cluster is restarted. But otherwise,
> how can these be cleaned up without having to restart cluster.
>
> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>
> Many Thanks
> Sandhya
>



-- 
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org