You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Harsh J (JIRA)" <ji...@apache.org> on 2011/06/11 17:15:16 UTC

[jira] [Updated] (MAPREDUCE-2236) No task may execute due to an Integer overflow possibility

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J updated MAPREDUCE-2236:
-------------------------------

    Attachment: MAPREDUCE-2236.r2.diff

Rebased patch, should apply cleanly to current trunk.

> No task may execute due to an Integer overflow possibility
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-2236
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2236
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.2
>         Environment: Linux, Hadoop 0.20.2
>            Reporter: Harsh J
>            Assignee: Harsh J
>            Priority: Critical
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2236.r1.diff, MAPREDUCE-2236.r2.diff
>
>
> If the attempts is configured to use Integer.MAX_VALUE, an overflow occurs inside TaskInProgress, and thereby no task is attempted by the cluster and the map tasks stay in pending state forever.
> For example, here's a job driver that causes this:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.fs.FSDataOutputStream;
> import org.apache.hadoop.fs.FileSystem;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.mapred.FileInputFormat;
> import org.apache.hadoop.mapred.JobClient;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.hadoop.mapred.TextInputFormat;
> import org.apache.hadoop.mapred.lib.IdentityMapper;
> import org.apache.hadoop.mapred.lib.NullOutputFormat;
> @SuppressWarnings("deprecation")
> public class IntegerOverflow {
> 	/**
> 	 * @param args
> 	 * @throws IOException 
> 	 */
> 	@SuppressWarnings("deprecation")
> 	public static void main(String[] args) throws IOException {
> 		JobConf conf = new JobConf();
> 		
> 		Path inputPath = new Path("ignore");
> 		FileSystem fs = FileSystem.get(conf);
> 		if (!fs.exists(inputPath)) {
> 			FSDataOutputStream out = fs.create(inputPath);
> 			out.writeChars("Test");
> 			out.close();
> 		}
> 		
> 		conf.setInputFormat(TextInputFormat.class);
> 		conf.setOutputFormat(NullOutputFormat.class);
> 		FileInputFormat.addInputPath(conf, inputPath);
> 		
> 		conf.setMapperClass(IdentityMapper.class);
> 		conf.setNumMapTasks(1);
> 		// Problem inducing line follows.
> 		conf.setMaxMapAttempts(Integer.MAX_VALUE);
> 		
> 		// No reducer in this test, although setMaxReduceAttempts leads to the same problem.
> 		conf.setNumReduceTasks(0);
> 		
> 		JobClient.runJob(conf);
> 	}
> }
> {code}
> The above code will not let any map task run. Additionally, a log would be created inside JobTracker logs with the following information that clearly shows the overflow:
> {code}
> 2010-12-30 00:59:07,836 WARN org.apache.hadoop.mapred.TaskInProgress: Exceeded limit of -2147483648 (plus 0 killed) attempts for the tip 'task_201012300058_0001_m_000000'
> {code}
> The issue lies inside the TaskInProgress class (/o/a/h/mapred/TaskInProgress.java), at line 1018 (trunk), part of the getTaskToRun(String taskTracker) method.
> {code}
>   public Task getTaskToRun(String taskTracker) throws IOException {   
>     // Create the 'taskid'; do not count the 'killed' tasks against the job!
>     TaskAttemptID taskid = null;
>     /* ============ THIS LINE v ====================================== */
>     if (nextTaskId < (MAX_TASK_EXECS + maxTaskAttempts + numKilledTasks)) {
>     /* ============ THIS LINE ^====================================== */
>       // Make sure that the attempts are unqiue across restarts
>       int attemptId = job.getNumRestarts() * NUM_ATTEMPTS_PER_RESTART + nextTaskId;
>       taskid = new TaskAttemptID( id, attemptId);
>       ++nextTaskId;
>     } else {
>       LOG.warn("Exceeded limit of " + (MAX_TASK_EXECS + maxTaskAttempts) +
>               " (plus " + numKilledTasks + " killed)"  + 
>               " attempts for the tip '" + getTIPId() + "'");
>       return null;
>     }
> {code}
> Since all three variables being added are integer in type, one of them being Integer.MAX_VALUE makes the condition fail with an overflow, thereby logging and returning a null as the result is negative.
> One solution would be to make one of these variables into a long, so the addition does not overflow?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira