You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Robert Dyer <ps...@gmail.com> on 2013/04/13 21:35:52 UTC

Job cleanup

What does the job cleanup task do?  My understanding was it just cleaned up
any intermediate/temporary files and moved the reducer output to the output
directory?  Does it do more?

One of my jobs runs, all maps and reduces finish, but then the job cleanup
task never finishes.  Instead it gets killed several times until the entire
Job gets killed:

Task attempt_201303272327_0772_m_000105_0 failed to report status for
600 seconds. Killing!


I suppose that since my reducers generate around 20GB of output, that
perhaps moving it takes too long?

Is it possible to disable speculative execution *only* for the cleanup task?

Re: Job cleanup

Posted by Robert Dyer <ps...@gmail.com>.

I think the problem is I need to report progress() from my cleanup task.
How can I do this?

The commitJob() in my custom
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter[1]
only provides org.apache.hadoop.mapreduce.JobContext[2]
which has no getProgressible() like the old
org.apache.hadoop.mapred.JobContext[3].

[1]
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.html#commitJob%28org.apache.hadoop.mapreduce.JobContext%29
[2]
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/JobContext.html
[3]
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobContext.html#getProgressible%28%29

On Sat, Apr 13, 2013 at 2:35 PM, Robert Dyer <ps...@gmail.com> wrote:

> What does the job cleanup task do?  My understanding was it just cleaned
> up any intermediate/temporary files and moved the reducer output to the
> output directory?  Does it do more?
>
> One of my jobs runs, all maps and reduces finish, but then the job cleanup
> task never finishes.  Instead it gets killed several times until the entire
> Job gets killed:
>
> Task attempt_201303272327_0772_m_000105_0 failed to report status for 600 seconds. Killing!
>
>
> I suppose that since my reducers generate around 20GB of output, that
> perhaps moving it takes too long?
>
> Is it possible to disable speculative execution *only* for the cleanup
> task?
>

Re: Job cleanup

Posted by Robert Dyer <ps...@gmail.com>.

I think the problem is I need to report progress() from my cleanup task.
How can I do this?

The commitJob() in my custom
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter[1]
only provides org.apache.hadoop.mapreduce.JobContext[2]
which has no getProgressible() like the old
org.apache.hadoop.mapred.JobContext[3].

[1]
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.html#commitJob%28org.apache.hadoop.mapreduce.JobContext%29
[2]
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/JobContext.html
[3]
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobContext.html#getProgressible%28%29

On Sat, Apr 13, 2013 at 2:35 PM, Robert Dyer <ps...@gmail.com> wrote:

> What does the job cleanup task do?  My understanding was it just cleaned
> up any intermediate/temporary files and moved the reducer output to the
> output directory?  Does it do more?
>
> One of my jobs runs, all maps and reduces finish, but then the job cleanup
> task never finishes.  Instead it gets killed several times until the entire
> Job gets killed:
>
> Task attempt_201303272327_0772_m_000105_0 failed to report status for 600 seconds. Killing!
>
>
> I suppose that since my reducers generate around 20GB of output, that
> perhaps moving it takes too long?
>
> Is it possible to disable speculative execution *only* for the cleanup
> task?
>

Re: Job cleanup

Posted by Robert Dyer <ps...@gmail.com>.

I think the problem is I need to report progress() from my cleanup task.
How can I do this?

The commitJob() in my custom
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter[1]
only provides org.apache.hadoop.mapreduce.JobContext[2]
which has no getProgressible() like the old
org.apache.hadoop.mapred.JobContext[3].

[1]
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.html#commitJob%28org.apache.hadoop.mapreduce.JobContext%29
[2]
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/JobContext.html
[3]
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobContext.html#getProgressible%28%29

On Sat, Apr 13, 2013 at 2:35 PM, Robert Dyer <ps...@gmail.com> wrote:

> What does the job cleanup task do?  My understanding was it just cleaned
> up any intermediate/temporary files and moved the reducer output to the
> output directory?  Does it do more?
>
> One of my jobs runs, all maps and reduces finish, but then the job cleanup
> task never finishes.  Instead it gets killed several times until the entire
> Job gets killed:
>
> Task attempt_201303272327_0772_m_000105_0 failed to report status for 600 seconds. Killing!
>
>
> I suppose that since my reducers generate around 20GB of output, that
> perhaps moving it takes too long?
>
> Is it possible to disable speculative execution *only* for the cleanup
> task?
>

Re: Job cleanup

Posted by Robert Dyer <ps...@gmail.com>.

I think the problem is I need to report progress() from my cleanup task.
How can I do this?

The commitJob() in my custom
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter[1]
only provides org.apache.hadoop.mapreduce.JobContext[2]
which has no getProgressible() like the old
org.apache.hadoop.mapred.JobContext[3].

[1]
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.html#commitJob%28org.apache.hadoop.mapreduce.JobContext%29
[2]
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/JobContext.html
[3]
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobContext.html#getProgressible%28%29

On Sat, Apr 13, 2013 at 2:35 PM, Robert Dyer <ps...@gmail.com> wrote:

> What does the job cleanup task do?  My understanding was it just cleaned
> up any intermediate/temporary files and moved the reducer output to the
> output directory?  Does it do more?
>
> One of my jobs runs, all maps and reduces finish, but then the job cleanup
> task never finishes.  Instead it gets killed several times until the entire
> Job gets killed:
>
> Task attempt_201303272327_0772_m_000105_0 failed to report status for 600 seconds. Killing!
>
>
> I suppose that since my reducers generate around 20GB of output, that
> perhaps moving it takes too long?
>
> Is it possible to disable speculative execution *only* for the cleanup
> task?
>