You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ken Krugler <kk...@transpac.com> on 2010/04/16 19:00:17 UTC

Issue with getTaskTrackers

Hi all,

I'm running 0.19.2 in EC2, and running into an occasional problem with  
ClusterStatus.getTaskTrackers().

The call to getTaskTrackers() is being made in the job jar's main  
function, before the job starts running I need to control some aspects  
of my job, for example setting the number of reduce tasks to be  
exactly equal to the number of servers, which should be equal to the  
number of task trackers.

Every so often (currently < 5%) the call to getTaskTrackers() will  
return a value less than expected - e.g. 2 instead of 6. This happens  
even when ClusterStatus.getJobTrackerState() returns State.RUNNING.

I'm assuming the problem is that some of the task trackers are taking  
extra time to spin up. I saw HADOOP-5337 (https://issues.apache.org/jira/browse/HADOOP-5337 
), which seems related, though that's for restarts vs. initial startup.

Given that the JobTracker waits for slaves to self-report, there  
doesn't seem to be a totally reliable, automatic solution to this  
issue, but I thought I'd ask to see if there's something I'm missing.

Thanks,

-- Ken

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g