You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Rares Vernica (JIRA)" <ji...@apache.org> on 2010/08/11 19:20:16 UTC
[jira] Created: (MAPREDUCE-2004) IP address vs host name in
updating Counter.DATA_LOCAL_MAPS
IP address vs host name in updating Counter.DATA_LOCAL_MAPS
-----------------------------------------------------------
Key: MAPREDUCE-2004
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2004
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: jobtracker
Affects Versions: 0.20.2
Reporter: Rares Vernica
Priority: Minor
Hello,
I set "mapred.task.cache.levels" to 1 so that I have only
data-local-map tasks. Still, by looking the the data-local-maps
counter it seems not all map tasks are local. I checked each map task
to see where it run and what split has been assigned to it and all the
maps were actually processing only local data. (BTW, replication was
set to 1.)
I looked into the JobClient so see what information is there for each
split. For each file, the first n-1 splits have an IP address as
location while the n-th split has a host name as location. The reason
for this is that there is a different code path in deciding the
location for the first n-1 splits versus the n-th split. The maps that
processed the splits where the location was a host name were counted
as data-local-maps while the others were not.
So, regardless of the fact that the JobClient gives IP or host names
for splits the job works fine. The problem is that the data-local-maps
counter does not take this into consideration.
Cheers,
Rares
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.