You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Tina Shan (Jira)" <ji...@apache.org> on 2021/03/09 07:15:00 UTC
[jira] [Created] (MAPREDUCE-7328) Job.monitorAndPrintJob function
can sleep most for 596 hours when jobclient.progress.monitor.poll.interval
is misconfigured , causing the job to hang
Tina Shan created MAPREDUCE-7328:
------------------------------------
Summary: Job.monitorAndPrintJob function can sleep most for 596 hours when jobclient.progress.monitor.poll.interval is misconfigured , causing the job to hang
Key: MAPREDUCE-7328
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7328
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: client
Affects Versions: 3.3.0
Reporter: Tina Shan
The loop terminates depending on a configurable value and there is little sanity checking on this value. When jobclient.progress.monitor.poll.interval is misconfigured to INT_MAX, it can cause the loop to sleep at most for 596 hours. The thread would get stuck and never report progress to the user even if the job moves forward. We suggest adding a cap value or a warning message.
{code:java}
public boolean monitorAndPrintJob()
throws IOException, InterruptedException {
...
while (!isComplete() || !reportedAfterCompletion) {
if (isComplete()) {
reportedAfterCompletion = true;
} else {
Thread.sleep(progMonitorPollIntervalMillis);
}
...
}
{code}
Similar bug to MAPREDUCE-7327
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-dev-help@hadoop.apache.org