You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rares Vernica <rv...@gmail.com> on 2010/08/10 23:58:08 UTC

possible bug in updating Counter.DATA_LOCAL_MAPS

Hello,

 I set "mapred.task.cache.levels" to 1 so that I have only
data-local-map tasks. Still, by looking the the data-local-maps
counter it seems not all map tasks are local. I checked each map task
to see where it run and what split has been assigned to it and all the
maps were actually processing only local data. (BTW, replication was
set to 1.)

I looked into the JobClient so see what information is there for each
split. For each file, the first n-1 splits have an IP address as
location while the n-th split has a host name as location. The reason
for this is that there is a different code path in deciding the
location for the first n-1 splits versus the n-th split. The maps that
processed the splits where the location was a host name were counted
as data-local-maps while the others were not.

So, regardless of the fact that the JobClient gives IP or host names
for splits the job works fine. The problem is that the data-local-maps
counter does not take this into consideration.

Cheers,
Rares

Re: possible bug in updating Counter.DATA_LOCAL_MAPS

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Thanks!

On Aug 11, 2010, at 10:20 AM, Rares Vernica wrote:

> Hi Arun,
>
> On Wed, Aug 11, 2010 at 8:41 AM, Arun C Murthy <ac...@yahoo-inc.com>  
> wrote:
>>
>>  This sounds like a  good bug to fix - can you please open a jira?
>
> I created https://issues.apache.org/jira/browse/MAPREDUCE-2004
>
> Cheers,
> Rares


Re: possible bug in updating Counter.DATA_LOCAL_MAPS

Posted by Rares Vernica <rv...@gmail.com>.
Hi Arun,

On Wed, Aug 11, 2010 at 8:41 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:
>
>  This sounds like a  good bug to fix - can you please open a jira?

I created https://issues.apache.org/jira/browse/MAPREDUCE-2004

Cheers,
Rares

Re: possible bug in updating Counter.DATA_LOCAL_MAPS

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Rares,

  This sounds like a  good bug to fix - can you please open a jira?

thanks,
Arun

On Aug 10, 2010, at 2:58 PM, Rares Vernica wrote:

> Hello,
>
> I set "mapred.task.cache.levels" to 1 so that I have only
> data-local-map tasks. Still, by looking the the data-local-maps
> counter it seems not all map tasks are local. I checked each map task
> to see where it run and what split has been assigned to it and all the
> maps were actually processing only local data. (BTW, replication was
> set to 1.)
>
> I looked into the JobClient so see what information is there for each
> split. For each file, the first n-1 splits have an IP address as
> location while the n-th split has a host name as location. The reason
> for this is that there is a different code path in deciding the
> location for the first n-1 splits versus the n-th split. The maps that
> processed the splits where the location was a host name were counted
> as data-local-maps while the others were not.
>
> So, regardless of the fact that the JobClient gives IP or host names
> for splits the job works fine. The problem is that the data-local-maps
> counter does not take this into consideration.
>
> Cheers,
> Rares