You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sandhya E <sa...@gmail.com> on 2009/04/28 09:01:29 UTC
intermediate files of killed tasks not purged
Hi
Under <hadoop-tmp-dir>/mapred/local there are directories like
"attempt_200904262046_0026_m_000002_0"
Each of these directories contains files of format: intermediate.1
intermediate.2 intermediate.3 intermediate.4 intermediate.5
There are many directories in this format. All these correspond to
killed task attempts. As they contain huge intermediate files, we
landed up in disk space issues.
They are cleaned up when mapred cluster is restarted. But otherwise,
how can these be cleaned up without having to restart cluster.
Conf parameter "keep.failed.task.files" is set to "false" in our case.
Many Thanks
Sandhya
Re: intermediate files of killed tasks not purged
Posted by Sandhya E <sa...@gmail.com>.
Attempt directories are in <hadoop-tmp>/mapred/local
I grep'd for one of the attempt that has left over in mapred/local in
tasktracker logs:
09/04/27 21:07:19 INFO mapred.TaskTracker: LaunchTaskAction:
attempt_200902120108_44218_r_000000_0
09/04/27 21:07:29 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:32 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:38 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:41 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:47 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:53 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:07:56 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:02 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:08 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:11 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:17 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.0% reduce > copy >
09/04/27 21:08:23 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:26 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:29 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:32 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:39 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:45 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:48 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:08:54 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:09:00 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.14285716% reduce > copy (6 of
14 at 2.03 MB/s) >
09/04/27 21:09:06 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.33333334% reduce > sort
09/04/27 21:09:09 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.33333334% reduce > sort
09/04/27 21:09:12 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.7029736% reduce > reduce
09/04/27 21:09:15 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.771893% reduce > reduce
09/04/27 21:09:18 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.8495109% reduce > reduce
09/04/27 21:09:21 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.9042134% reduce > reduce
09/04/27 21:09:24 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.98041093% reduce > reduce
09/04/27 21:09:26 INFO mapred.TaskTracker:
attempt_200902120108_44218_r_000000_0 0.99415195% reduce > reduce
09/04/27 21:09:26 INFO mapred.TaskTracker: Task
attempt_200902120108_44218_r_000000_0 is done.
09/04/27 21:09:31 INFO mapred.TaskRunner:
attempt_200902120108_44218_r_000000_0 done; removing files.
Regards
Sandhya
On Tue, Apr 28, 2009 at 2:39 PM, Amareshwari Sriramadasu
<am...@yahoo-inc.com> wrote:
> Again, where are you seeing the attemptid directories? are they at
> mapred/local/<attemptid> or at
> mapred/local/taskTracker/jobCache/<jobid>/<attempid>.
> If you are seeing files at mapred/local/<attemptid>, then it is bug. Please
> raise a jira and attach tasktracker logs if possible.
> If not, mapred/local/taskTracker/jobCache/<jobid>/<attempid> directories are
> cleaned up on a KillTaskAction and mapred/local/taskTracker/jobCache/<jobid>
> directories are cleanedup on KillJobAction. Can you verify from TaskTracker
> logs, the attemptid got a KillTaskAction or jobid got a KillJobAction? If
> not, This is fixed by HADOOP-5247.
>
> Thanks
> Amareshwari
>
> Sandhya E wrote:
>>
>> Hi Amareshwari
>>
>> We are on 0.18 version. I verified from jobtracker website that not
>> all killed tasks have left overs in mapred/local. Also there are some
>> tasks that were successful have left their tmp folders in mapred/local
>>
>> Can you please give some pointers on how to debug it further.
>>
>> Regards
>> Sandhya
>>
>> On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu
>> <am...@yahoo-inc.com> wrote:
>>
>>>
>>> Hi Sandhya,
>>>
>>> Which version of HADOOP are you using? There could be <attempt_id>
>>> directories in mapred/local, pre 0.17. Now, there should not be any such
>>> directories.
>>> From version 0.17 onwards, the attempt directories will be present only
>>> at
>>> mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are seeing
>>> the
>>> directories in any other location, then it seems like a bug.
>>>
>>> HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it does
>>> not change local FileSystem files.
>>>
>>> Thanks
>>> Amareshwari
>>> Edward J. Yoon wrote:
>>>
>>>>
>>>> Hi,
>>>>
>>>> It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.
>>>>
>>>> On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sa...@gmail.com>
>>>> wrote:
>>>>
>>>>
>>>>>
>>>>> Hi
>>>>>
>>>>> Under <hadoop-tmp-dir>/mapred/local there are directories like
>>>>> "attempt_200904262046_0026_m_000002_0"
>>>>> Each of these directories contains files of format: intermediate.1
>>>>> intermediate.2 intermediate.3 intermediate.4 intermediate.5
>>>>> There are many directories in this format. All these correspond to
>>>>> killed task attempts. As they contain huge intermediate files, we
>>>>> landed up in disk space issues.
>>>>>
>>>>> They are cleaned up when mapred cluster is restarted. But otherwise,
>>>>> how can these be cleaned up without having to restart cluster.
>>>>>
>>>>> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>>>>>
>>>>> Many Thanks
>>>>> Sandhya
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>
>
Re: intermediate files of killed tasks not purged
Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.
Again, where are you seeing the attemptid directories? are they at
mapred/local/<attemptid> or at
mapred/local/taskTracker/jobCache/<jobid>/<attempid>.
If you are seeing files at mapred/local/<attemptid>, then it is bug.
Please raise a jira and attach tasktracker logs if possible.
If not, mapred/local/taskTracker/jobCache/<jobid>/<attempid> directories
are cleaned up on a KillTaskAction and
mapred/local/taskTracker/jobCache/<jobid> directories are cleanedup on
KillJobAction. Can you verify from TaskTracker logs, the attemptid got a
KillTaskAction or jobid got a KillJobAction? If not, This is fixed by
HADOOP-5247.
Thanks
Amareshwari
Sandhya E wrote:
> Hi Amareshwari
>
> We are on 0.18 version. I verified from jobtracker website that not
> all killed tasks have left overs in mapred/local. Also there are some
> tasks that were successful have left their tmp folders in mapred/local
>
> Can you please give some pointers on how to debug it further.
>
> Regards
> Sandhya
>
> On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu
> <am...@yahoo-inc.com> wrote:
>
>> Hi Sandhya,
>>
>> Which version of HADOOP are you using? There could be <attempt_id>
>> directories in mapred/local, pre 0.17. Now, there should not be any such
>> directories.
>> From version 0.17 onwards, the attempt directories will be present only at
>> mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are seeing the
>> directories in any other location, then it seems like a bug.
>>
>> HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it does
>> not change local FileSystem files.
>>
>> Thanks
>> Amareshwari
>> Edward J. Yoon wrote:
>>
>>> Hi,
>>>
>>> It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.
>>>
>>> On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sa...@gmail.com>
>>> wrote:
>>>
>>>
>>>> Hi
>>>>
>>>> Under <hadoop-tmp-dir>/mapred/local there are directories like
>>>> "attempt_200904262046_0026_m_000002_0"
>>>> Each of these directories contains files of format: intermediate.1
>>>> intermediate.2 intermediate.3 intermediate.4 intermediate.5
>>>> There are many directories in this format. All these correspond to
>>>> killed task attempts. As they contain huge intermediate files, we
>>>> landed up in disk space issues.
>>>>
>>>> They are cleaned up when mapred cluster is restarted. But otherwise,
>>>> how can these be cleaned up without having to restart cluster.
>>>>
>>>> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>>>>
>>>> Many Thanks
>>>> Sandhya
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
Re: intermediate files of killed tasks not purged
Posted by Sandhya E <sa...@gmail.com>.
Hi Amareshwari
We are on 0.18 version. I verified from jobtracker website that not
all killed tasks have left overs in mapred/local. Also there are some
tasks that were successful have left their tmp folders in mapred/local
Can you please give some pointers on how to debug it further.
Regards
Sandhya
On Tue, Apr 28, 2009 at 2:02 PM, Amareshwari Sriramadasu
<am...@yahoo-inc.com> wrote:
> Hi Sandhya,
>
> Which version of HADOOP are you using? There could be <attempt_id>
> directories in mapred/local, pre 0.17. Now, there should not be any such
> directories.
> From version 0.17 onwards, the attempt directories will be present only at
> mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are seeing the
> directories in any other location, then it seems like a bug.
>
> HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it does
> not change local FileSystem files.
>
> Thanks
> Amareshwari
> Edward J. Yoon wrote:
>>
>> Hi,
>>
>> It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.
>>
>> On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sa...@gmail.com>
>> wrote:
>>
>>>
>>> Hi
>>>
>>> Under <hadoop-tmp-dir>/mapred/local there are directories like
>>> "attempt_200904262046_0026_m_000002_0"
>>> Each of these directories contains files of format: intermediate.1
>>> intermediate.2 intermediate.3 intermediate.4 intermediate.5
>>> There are many directories in this format. All these correspond to
>>> killed task attempts. As they contain huge intermediate files, we
>>> landed up in disk space issues.
>>>
>>> They are cleaned up when mapred cluster is restarted. But otherwise,
>>> how can these be cleaned up without having to restart cluster.
>>>
>>> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>>>
>>> Many Thanks
>>> Sandhya
>>>
>>>
>>
>>
>>
>>
>
>
Re: intermediate files of killed tasks not purged
Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.
Hi Sandhya,
Which version of HADOOP are you using? There could be <attempt_id>
directories in mapred/local, pre 0.17. Now, there should not be any such
directories.
From version 0.17 onwards, the attempt directories will be present only
at mapred/local/taskTracker/jobCache/<jobid>/<attempid> . If you are
seeing the directories in any other location, then it seems like a bug.
HADOOP-4654 is to cleanup temporary data in DFS for failed tasks, it
does not change local FileSystem files.
Thanks
Amareshwari
Edward J. Yoon wrote:
> Hi,
>
> It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.
>
> On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sa...@gmail.com> wrote:
>
>> Hi
>>
>> Under <hadoop-tmp-dir>/mapred/local there are directories like
>> "attempt_200904262046_0026_m_000002_0"
>> Each of these directories contains files of format: intermediate.1
>> intermediate.2 intermediate.3 intermediate.4 intermediate.5
>> There are many directories in this format. All these correspond to
>> killed task attempts. As they contain huge intermediate files, we
>> landed up in disk space issues.
>>
>> They are cleaned up when mapred cluster is restarted. But otherwise,
>> how can these be cleaned up without having to restart cluster.
>>
>> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>>
>> Many Thanks
>> Sandhya
>>
>>
>
>
>
>
Re: intermediate files of killed tasks not purged
Posted by "Edward J. Yoon" <ed...@apache.org>.
Hi,
It seems related with https://issues.apache.org/jira/browse/HADOOP-4654.
On Tue, Apr 28, 2009 at 4:01 PM, Sandhya E <sa...@gmail.com> wrote:
> Hi
>
> Under <hadoop-tmp-dir>/mapred/local there are directories like
> "attempt_200904262046_0026_m_000002_0"
> Each of these directories contains files of format: intermediate.1
> intermediate.2 intermediate.3 intermediate.4 intermediate.5
> There are many directories in this format. All these correspond to
> killed task attempts. As they contain huge intermediate files, we
> landed up in disk space issues.
>
> They are cleaned up when mapred cluster is restarted. But otherwise,
> how can these be cleaned up without having to restart cluster.
>
> Conf parameter "keep.failed.task.files" is set to "false" in our case.
>
> Many Thanks
> Sandhya
>
--
Best Regards, Edward J. Yoon @ NHN, corp.
edwardyoon@apache.org
http://blog.udanax.org