You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by "Chen, Haifeng" <ha...@intel.com> on 2013/01/08 04:07:22 UTC

Some mappers are much slower than others in reading data from HDFS

Dear sir,
I encountered a strange problem that all the mappers on some nodes are much slower than the mappers on other nodes as following some times (not always). I didn't see any reasons why they should slow down in this pattern.

000013(MAP on node4): --------(8.115)
000014(MAP on node4): --------(8.570)
000011(MAP on node4): --------(8.5)
000016(MAP on node4): --------(8.344)
000010(MAP on node4): --------(8.585)
000015(MAP on node4): --------(8.179)
000017(MAP on node4): --------(8.445)
000012(MAP on node4): --------(8.312)
000018(MAP on node2):  ---(3.367)
000020(MAP on node2):  ---(3.335)
000019(MAP on node2):  ---(3.320)
000023(MAP on node2):  ---(3.91)
000022(MAP on node2):  ---(3.371)
000021(MAP on node2):  ---(3.458)
000004(MAP on node3):   -------------------(19.624)
000007(MAP on node3):   -------------------(19.92)
000005(MAP on node3):   --------------------(20.613)
000008(MAP on node3):   --------------------(20.316)
000003(MAP on node3):   --------------------(20.574)
000006(MAP on node3):   --------------------(20.654)
000002(MAP on node3):   -------------------(19.935)
000009(MAP on node3):   --------------------(20.489)
000025(MAP on node2):      --(2.877)
000026(MAP on node2):      ---(3.112)
000027(MAP on node2):      --(2.959)
000024(MAP on node2):      --(2.845)
000029(MAP on node2):       --(2.863)
000028(MAP on node2):       --(2.933)
000031(MAP on node2):          --(2.596)
000030(RED on node2):          -------------(13.378)

The testing is as following:
I have a 4 nodes cluster and all of them has the same hardware and software configurations. One node acts as name node and yarn resource manager.  Other three nodes act as both data node and yarn node manager.

The test input file is around 7GB file on the HDFS cluster and the replication number is 3. (This means that each data node has a copy of every block of the file)

The mapper did nothing and didn't write out any records:

public static class KeyMapper
       extends Mapper<Object, Text, Text, Text>{
    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {

    }
  }

So this mapper logically is reading and iterating through its splits of data and then finish the job.
I didn't see any factors in the above configurations that will cause the above phenomenon.
I turned on the debug log for each mapper task and it also showed that all the mapper's DFSClient read data from its local data node.

Can any experts help give some hints for this? I attached the log and client code for analysis.

Thanks,
Haifeng

Re: Some mappers are much slower than others in reading data from HDFS

Posted by Andy Isaacson <ad...@cloudera.com>.

It's hard to speculate on the minimal information you've provided so
far. Seems like it's time to break out the performance analysis
toolkit and see what's going on differently between the fast and the
slow nodes. I'd look at raw performance with dd, then watch behavior
during mapper runs with vmstat or iostat and top, and investigate any
configuration differences between the nodes.

-andy

On Sun, Jan 13, 2013 at 6:05 PM, Chen, Haifeng <ha...@intel.com> wrote:
> Thank for your response, Andy.
>
> I think node2 has more mappers than others because the mappers on node2 move on faster and finish earlier than the mappers on node3 and node4. When the first wave of mappers on node2 (00018, 00019, ...) finished and there are more mappers to run, then the resource manager allocated another wave of the mappers to node2 (00025, 00026, ...). And still, the mappers on node2 are the ones runs fastest.
>
> Is it possible that it is because of cache distribution(disk cache?) Does data node hold any cache of recently accessed data?
>
> Regards,
> Haifeng
>
>
> -----Original Message-----
> From: Andy Isaacson [mailto:adi@cloudera.com]
> Sent: Wednesday, January 09, 2013 5:46 AM
> To: user@hadoop.apache.org
> Subject: Re: Some mappers are much slower than others in reading data from HDFS
>
> Your output shows that node2 has 13 mappers and the reducer, while
> node3 and node4 had only 8 mappers each.  So I'd expect some
> disparity. Since it's hard to correlate the mapper throughput against
> the reducer throughput, it's possible that node3 got just as much work
> done.
>
> That doesn't explain why node4 is slower than node3, though.
>
> -andy
>
> On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng <ha...@intel.com> wrote:
>> Dear sir,
>>
>> I encountered a strange problem that all the mappers on some nodes are much
>> slower than the mappers on other nodes as following some times (not always).
>> I didn't see any reasons why they should slow down in this pattern.
>>
>>
>>
>> 000013(MAP on node4): --------(8.115)
>>
>> 000014(MAP on node4): --------(8.570)
>>
>> 000011(MAP on node4): --------(8.5)
>>
>> 000016(MAP on node4): --------(8.344)
>>
>> 000010(MAP on node4): --------(8.585)
>>
>> 000015(MAP on node4): --------(8.179)
>>
>> 000017(MAP on node4): --------(8.445)
>>
>> 000012(MAP on node4): --------(8.312)
>>
>> 000018(MAP on node2):  ---(3.367)
>>
>> 000020(MAP on node2):  ---(3.335)
>>
>> 000019(MAP on node2):  ---(3.320)
>>
>> 000023(MAP on node2):  ---(3.91)
>>
>> 000022(MAP on node2):  ---(3.371)
>>
>> 000021(MAP on node2):  ---(3.458)
>>
>> 000004(MAP on node3):   -------------------(19.624)
>>
>> 000007(MAP on node3):   -------------------(19.92)
>>
>> 000005(MAP on node3):   --------------------(20.613)
>>
>> 000008(MAP on node3):   --------------------(20.316)
>>
>> 000003(MAP on node3):   --------------------(20.574)
>>
>> 000006(MAP on node3):   --------------------(20.654)
>>
>> 000002(MAP on node3):   -------------------(19.935)
>>
>> 000009(MAP on node3):   --------------------(20.489)
>>
>> 000025(MAP on node2):      --(2.877)
>>
>> 000026(MAP on node2):      ---(3.112)
>>
>> 000027(MAP on node2):      --(2.959)
>>
>> 000024(MAP on node2):      --(2.845)
>>
>> 000029(MAP on node2):       --(2.863)
>>
>> 000028(MAP on node2):       --(2.933)
>>
>> 000031(MAP on node2):          --(2.596)
>>
>> 000030(RED on node2):          -------------(13.378)
>>
>>
>>
>> The testing is as following:
>>
>> I have a 4 nodes cluster and all of them has the same hardware and software
>> configurations. One node acts as name node and yarn resource manager.  Other
>> three nodes act as both data node and yarn node manager.
>>
>>
>>
>> The test input file is around 7GB file on the HDFS cluster and the
>> replication number is 3. (This means that each data node has a copy of every
>> block of the file)
>>
>>
>>
>> The mapper did nothing and didn't write out any records:
>>
>>
>>
>> public static class KeyMapper
>>
>>        extends Mapper<Object, Text, Text, Text>{
>>
>>     public void map(Object key, Text value, Context context
>>
>>                     ) throws IOException, InterruptedException {
>>
>>
>>
>>     }
>>
>>   }
>>
>>
>>
>> So this mapper logically is reading and iterating through its splits of data
>> and then finish the job.
>>
>> I didn't see any factors in the above configurations that will cause the
>> above phenomenon.
>>
>> I turned on the debug log for each mapper task and it also showed that all
>> the mapper's DFSClient read data from its local data node.
>>
>>
>>
>> Can any experts help give some hints for this? I attached the log and client
>> code for analysis.
>>
>>
>>
>> Thanks,
>>
>> Haifeng

Re: Some mappers are much slower than others in reading data from HDFS

Posted by Andy Isaacson <ad...@cloudera.com>.

It's hard to speculate on the minimal information you've provided so
far. Seems like it's time to break out the performance analysis
toolkit and see what's going on differently between the fast and the
slow nodes. I'd look at raw performance with dd, then watch behavior
during mapper runs with vmstat or iostat and top, and investigate any
configuration differences between the nodes.

-andy

On Sun, Jan 13, 2013 at 6:05 PM, Chen, Haifeng <ha...@intel.com> wrote:
> Thank for your response, Andy.
>
> I think node2 has more mappers than others because the mappers on node2 move on faster and finish earlier than the mappers on node3 and node4. When the first wave of mappers on node2 (00018, 00019, ...) finished and there are more mappers to run, then the resource manager allocated another wave of the mappers to node2 (00025, 00026, ...). And still, the mappers on node2 are the ones runs fastest.
>
> Is it possible that it is because of cache distribution(disk cache?) Does data node hold any cache of recently accessed data?
>
> Regards,
> Haifeng
>
>
> -----Original Message-----
> From: Andy Isaacson [mailto:adi@cloudera.com]
> Sent: Wednesday, January 09, 2013 5:46 AM
> To: user@hadoop.apache.org
> Subject: Re: Some mappers are much slower than others in reading data from HDFS
>
> Your output shows that node2 has 13 mappers and the reducer, while
> node3 and node4 had only 8 mappers each.  So I'd expect some
> disparity. Since it's hard to correlate the mapper throughput against
> the reducer throughput, it's possible that node3 got just as much work
> done.
>
> That doesn't explain why node4 is slower than node3, though.
>
> -andy
>
> On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng <ha...@intel.com> wrote:
>> Dear sir,
>>
>> I encountered a strange problem that all the mappers on some nodes are much
>> slower than the mappers on other nodes as following some times (not always).
>> I didn't see any reasons why they should slow down in this pattern.
>>
>>
>>
>> 000013(MAP on node4): --------(8.115)
>>
>> 000014(MAP on node4): --------(8.570)
>>
>> 000011(MAP on node4): --------(8.5)
>>
>> 000016(MAP on node4): --------(8.344)
>>
>> 000010(MAP on node4): --------(8.585)
>>
>> 000015(MAP on node4): --------(8.179)
>>
>> 000017(MAP on node4): --------(8.445)
>>
>> 000012(MAP on node4): --------(8.312)
>>
>> 000018(MAP on node2):  ---(3.367)
>>
>> 000020(MAP on node2):  ---(3.335)
>>
>> 000019(MAP on node2):  ---(3.320)
>>
>> 000023(MAP on node2):  ---(3.91)
>>
>> 000022(MAP on node2):  ---(3.371)
>>
>> 000021(MAP on node2):  ---(3.458)
>>
>> 000004(MAP on node3):   -------------------(19.624)
>>
>> 000007(MAP on node3):   -------------------(19.92)
>>
>> 000005(MAP on node3):   --------------------(20.613)
>>
>> 000008(MAP on node3):   --------------------(20.316)
>>
>> 000003(MAP on node3):   --------------------(20.574)
>>
>> 000006(MAP on node3):   --------------------(20.654)
>>
>> 000002(MAP on node3):   -------------------(19.935)
>>
>> 000009(MAP on node3):   --------------------(20.489)
>>
>> 000025(MAP on node2):      --(2.877)
>>
>> 000026(MAP on node2):      ---(3.112)
>>
>> 000027(MAP on node2):      --(2.959)
>>
>> 000024(MAP on node2):      --(2.845)
>>
>> 000029(MAP on node2):       --(2.863)
>>
>> 000028(MAP on node2):       --(2.933)
>>
>> 000031(MAP on node2):          --(2.596)
>>
>> 000030(RED on node2):          -------------(13.378)
>>
>>
>>
>> The testing is as following:
>>
>> I have a 4 nodes cluster and all of them has the same hardware and software
>> configurations. One node acts as name node and yarn resource manager.  Other
>> three nodes act as both data node and yarn node manager.
>>
>>
>>
>> The test input file is around 7GB file on the HDFS cluster and the
>> replication number is 3. (This means that each data node has a copy of every
>> block of the file)
>>
>>
>>
>> The mapper did nothing and didn't write out any records:
>>
>>
>>
>> public static class KeyMapper
>>
>>        extends Mapper<Object, Text, Text, Text>{
>>
>>     public void map(Object key, Text value, Context context
>>
>>                     ) throws IOException, InterruptedException {
>>
>>
>>
>>     }
>>
>>   }
>>
>>
>>
>> So this mapper logically is reading and iterating through its splits of data
>> and then finish the job.
>>
>> I didn't see any factors in the above configurations that will cause the
>> above phenomenon.
>>
>> I turned on the debug log for each mapper task and it also showed that all
>> the mapper's DFSClient read data from its local data node.
>>
>>
>>
>> Can any experts help give some hints for this? I attached the log and client
>> code for analysis.
>>
>>
>>
>> Thanks,
>>
>> Haifeng

Re: Some mappers are much slower than others in reading data from HDFS

Posted by Andy Isaacson <ad...@cloudera.com>.

It's hard to speculate on the minimal information you've provided so
far. Seems like it's time to break out the performance analysis
toolkit and see what's going on differently between the fast and the
slow nodes. I'd look at raw performance with dd, then watch behavior
during mapper runs with vmstat or iostat and top, and investigate any
configuration differences between the nodes.

-andy

On Sun, Jan 13, 2013 at 6:05 PM, Chen, Haifeng <ha...@intel.com> wrote:
> Thank for your response, Andy.
>
> I think node2 has more mappers than others because the mappers on node2 move on faster and finish earlier than the mappers on node3 and node4. When the first wave of mappers on node2 (00018, 00019, ...) finished and there are more mappers to run, then the resource manager allocated another wave of the mappers to node2 (00025, 00026, ...). And still, the mappers on node2 are the ones runs fastest.
>
> Is it possible that it is because of cache distribution(disk cache?) Does data node hold any cache of recently accessed data?
>
> Regards,
> Haifeng
>
>
> -----Original Message-----
> From: Andy Isaacson [mailto:adi@cloudera.com]
> Sent: Wednesday, January 09, 2013 5:46 AM
> To: user@hadoop.apache.org
> Subject: Re: Some mappers are much slower than others in reading data from HDFS
>
> Your output shows that node2 has 13 mappers and the reducer, while
> node3 and node4 had only 8 mappers each.  So I'd expect some
> disparity. Since it's hard to correlate the mapper throughput against
> the reducer throughput, it's possible that node3 got just as much work
> done.
>
> That doesn't explain why node4 is slower than node3, though.
>
> -andy
>
> On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng <ha...@intel.com> wrote:
>> Dear sir,
>>
>> I encountered a strange problem that all the mappers on some nodes are much
>> slower than the mappers on other nodes as following some times (not always).
>> I didn't see any reasons why they should slow down in this pattern.
>>
>>
>>
>> 000013(MAP on node4): --------(8.115)
>>
>> 000014(MAP on node4): --------(8.570)
>>
>> 000011(MAP on node4): --------(8.5)
>>
>> 000016(MAP on node4): --------(8.344)
>>
>> 000010(MAP on node4): --------(8.585)
>>
>> 000015(MAP on node4): --------(8.179)
>>
>> 000017(MAP on node4): --------(8.445)
>>
>> 000012(MAP on node4): --------(8.312)
>>
>> 000018(MAP on node2):  ---(3.367)
>>
>> 000020(MAP on node2):  ---(3.335)
>>
>> 000019(MAP on node2):  ---(3.320)
>>
>> 000023(MAP on node2):  ---(3.91)
>>
>> 000022(MAP on node2):  ---(3.371)
>>
>> 000021(MAP on node2):  ---(3.458)
>>
>> 000004(MAP on node3):   -------------------(19.624)
>>
>> 000007(MAP on node3):   -------------------(19.92)
>>
>> 000005(MAP on node3):   --------------------(20.613)
>>
>> 000008(MAP on node3):   --------------------(20.316)
>>
>> 000003(MAP on node3):   --------------------(20.574)
>>
>> 000006(MAP on node3):   --------------------(20.654)
>>
>> 000002(MAP on node3):   -------------------(19.935)
>>
>> 000009(MAP on node3):   --------------------(20.489)
>>
>> 000025(MAP on node2):      --(2.877)
>>
>> 000026(MAP on node2):      ---(3.112)
>>
>> 000027(MAP on node2):      --(2.959)
>>
>> 000024(MAP on node2):      --(2.845)
>>
>> 000029(MAP on node2):       --(2.863)
>>
>> 000028(MAP on node2):       --(2.933)
>>
>> 000031(MAP on node2):          --(2.596)
>>
>> 000030(RED on node2):          -------------(13.378)
>>
>>
>>
>> The testing is as following:
>>
>> I have a 4 nodes cluster and all of them has the same hardware and software
>> configurations. One node acts as name node and yarn resource manager.  Other
>> three nodes act as both data node and yarn node manager.
>>
>>
>>
>> The test input file is around 7GB file on the HDFS cluster and the
>> replication number is 3. (This means that each data node has a copy of every
>> block of the file)
>>
>>
>>
>> The mapper did nothing and didn't write out any records:
>>
>>
>>
>> public static class KeyMapper
>>
>>        extends Mapper<Object, Text, Text, Text>{
>>
>>     public void map(Object key, Text value, Context context
>>
>>                     ) throws IOException, InterruptedException {
>>
>>
>>
>>     }
>>
>>   }
>>
>>
>>
>> So this mapper logically is reading and iterating through its splits of data
>> and then finish the job.
>>
>> I didn't see any factors in the above configurations that will cause the
>> above phenomenon.
>>
>> I turned on the debug log for each mapper task and it also showed that all
>> the mapper's DFSClient read data from its local data node.
>>
>>
>>
>> Can any experts help give some hints for this? I attached the log and client
>> code for analysis.
>>
>>
>>
>> Thanks,
>>
>> Haifeng

Re: Some mappers are much slower than others in reading data from HDFS

Posted by Andy Isaacson <ad...@cloudera.com>.

It's hard to speculate on the minimal information you've provided so
far. Seems like it's time to break out the performance analysis
toolkit and see what's going on differently between the fast and the
slow nodes. I'd look at raw performance with dd, then watch behavior
during mapper runs with vmstat or iostat and top, and investigate any
configuration differences between the nodes.

-andy

On Sun, Jan 13, 2013 at 6:05 PM, Chen, Haifeng <ha...@intel.com> wrote:
> Thank for your response, Andy.
>
> I think node2 has more mappers than others because the mappers on node2 move on faster and finish earlier than the mappers on node3 and node4. When the first wave of mappers on node2 (00018, 00019, ...) finished and there are more mappers to run, then the resource manager allocated another wave of the mappers to node2 (00025, 00026, ...). And still, the mappers on node2 are the ones runs fastest.
>
> Is it possible that it is because of cache distribution(disk cache?) Does data node hold any cache of recently accessed data?
>
> Regards,
> Haifeng
>
>
> -----Original Message-----
> From: Andy Isaacson [mailto:adi@cloudera.com]
> Sent: Wednesday, January 09, 2013 5:46 AM
> To: user@hadoop.apache.org
> Subject: Re: Some mappers are much slower than others in reading data from HDFS
>
> Your output shows that node2 has 13 mappers and the reducer, while
> node3 and node4 had only 8 mappers each.  So I'd expect some
> disparity. Since it's hard to correlate the mapper throughput against
> the reducer throughput, it's possible that node3 got just as much work
> done.
>
> That doesn't explain why node4 is slower than node3, though.
>
> -andy
>
> On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng <ha...@intel.com> wrote:
>> Dear sir,
>>
>> I encountered a strange problem that all the mappers on some nodes are much
>> slower than the mappers on other nodes as following some times (not always).
>> I didn't see any reasons why they should slow down in this pattern.
>>
>>
>>
>> 000013(MAP on node4): --------(8.115)
>>
>> 000014(MAP on node4): --------(8.570)
>>
>> 000011(MAP on node4): --------(8.5)
>>
>> 000016(MAP on node4): --------(8.344)
>>
>> 000010(MAP on node4): --------(8.585)
>>
>> 000015(MAP on node4): --------(8.179)
>>
>> 000017(MAP on node4): --------(8.445)
>>
>> 000012(MAP on node4): --------(8.312)
>>
>> 000018(MAP on node2):  ---(3.367)
>>
>> 000020(MAP on node2):  ---(3.335)
>>
>> 000019(MAP on node2):  ---(3.320)
>>
>> 000023(MAP on node2):  ---(3.91)
>>
>> 000022(MAP on node2):  ---(3.371)
>>
>> 000021(MAP on node2):  ---(3.458)
>>
>> 000004(MAP on node3):   -------------------(19.624)
>>
>> 000007(MAP on node3):   -------------------(19.92)
>>
>> 000005(MAP on node3):   --------------------(20.613)
>>
>> 000008(MAP on node3):   --------------------(20.316)
>>
>> 000003(MAP on node3):   --------------------(20.574)
>>
>> 000006(MAP on node3):   --------------------(20.654)
>>
>> 000002(MAP on node3):   -------------------(19.935)
>>
>> 000009(MAP on node3):   --------------------(20.489)
>>
>> 000025(MAP on node2):      --(2.877)
>>
>> 000026(MAP on node2):      ---(3.112)
>>
>> 000027(MAP on node2):      --(2.959)
>>
>> 000024(MAP on node2):      --(2.845)
>>
>> 000029(MAP on node2):       --(2.863)
>>
>> 000028(MAP on node2):       --(2.933)
>>
>> 000031(MAP on node2):          --(2.596)
>>
>> 000030(RED on node2):          -------------(13.378)
>>
>>
>>
>> The testing is as following:
>>
>> I have a 4 nodes cluster and all of them has the same hardware and software
>> configurations. One node acts as name node and yarn resource manager.  Other
>> three nodes act as both data node and yarn node manager.
>>
>>
>>
>> The test input file is around 7GB file on the HDFS cluster and the
>> replication number is 3. (This means that each data node has a copy of every
>> block of the file)
>>
>>
>>
>> The mapper did nothing and didn't write out any records:
>>
>>
>>
>> public static class KeyMapper
>>
>>        extends Mapper<Object, Text, Text, Text>{
>>
>>     public void map(Object key, Text value, Context context
>>
>>                     ) throws IOException, InterruptedException {
>>
>>
>>
>>     }
>>
>>   }
>>
>>
>>
>> So this mapper logically is reading and iterating through its splits of data
>> and then finish the job.
>>
>> I didn't see any factors in the above configurations that will cause the
>> above phenomenon.
>>
>> I turned on the debug log for each mapper task and it also showed that all
>> the mapper's DFSClient read data from its local data node.
>>
>>
>>
>> Can any experts help give some hints for this? I attached the log and client
>> code for analysis.
>>
>>
>>
>> Thanks,
>>
>> Haifeng

RE: Some mappers are much slower than others in reading data from HDFS

Posted by "Chen, Haifeng" <ha...@intel.com>.

Thank for your response, Andy.

I think node2 has more mappers than others because the mappers on node2 move on faster and finish earlier than the mappers on node3 and node4. When the first wave of mappers on node2 (00018, 00019, ...) finished and there are more mappers to run, then the resource manager allocated another wave of the mappers to node2 (00025, 00026, ...). And still, the mappers on node2 are the ones runs fastest.

Is it possible that it is because of cache distribution(disk cache?) Does data node hold any cache of recently accessed data?

Regards,
Haifeng


-----Original Message-----
From: Andy Isaacson [mailto:adi@cloudera.com] 
Sent: Wednesday, January 09, 2013 5:46 AM
To: user@hadoop.apache.org
Subject: Re: Some mappers are much slower than others in reading data from HDFS

Your output shows that node2 has 13 mappers and the reducer, while
node3 and node4 had only 8 mappers each.  So I'd expect some
disparity. Since it's hard to correlate the mapper throughput against
the reducer throughput, it's possible that node3 got just as much work
done.

That doesn't explain why node4 is slower than node3, though.

-andy

On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng <ha...@intel.com> wrote:
> Dear sir,
>
> I encountered a strange problem that all the mappers on some nodes are much
> slower than the mappers on other nodes as following some times (not always).
> I didn't see any reasons why they should slow down in this pattern.
>
>
>
> 000013(MAP on node4): --------(8.115)
>
> 000014(MAP on node4): --------(8.570)
>
> 000011(MAP on node4): --------(8.5)
>
> 000016(MAP on node4): --------(8.344)
>
> 000010(MAP on node4): --------(8.585)
>
> 000015(MAP on node4): --------(8.179)
>
> 000017(MAP on node4): --------(8.445)
>
> 000012(MAP on node4): --------(8.312)
>
> 000018(MAP on node2):  ---(3.367)
>
> 000020(MAP on node2):  ---(3.335)
>
> 000019(MAP on node2):  ---(3.320)
>
> 000023(MAP on node2):  ---(3.91)
>
> 000022(MAP on node2):  ---(3.371)
>
> 000021(MAP on node2):  ---(3.458)
>
> 000004(MAP on node3):   -------------------(19.624)
>
> 000007(MAP on node3):   -------------------(19.92)
>
> 000005(MAP on node3):   --------------------(20.613)
>
> 000008(MAP on node3):   --------------------(20.316)
>
> 000003(MAP on node3):   --------------------(20.574)
>
> 000006(MAP on node3):   --------------------(20.654)
>
> 000002(MAP on node3):   -------------------(19.935)
>
> 000009(MAP on node3):   --------------------(20.489)
>
> 000025(MAP on node2):      --(2.877)
>
> 000026(MAP on node2):      ---(3.112)
>
> 000027(MAP on node2):      --(2.959)
>
> 000024(MAP on node2):      --(2.845)
>
> 000029(MAP on node2):       --(2.863)
>
> 000028(MAP on node2):       --(2.933)
>
> 000031(MAP on node2):          --(2.596)
>
> 000030(RED on node2):          -------------(13.378)
>
>
>
> The testing is as following:
>
> I have a 4 nodes cluster and all of them has the same hardware and software
> configurations. One node acts as name node and yarn resource manager.  Other
> three nodes act as both data node and yarn node manager.
>
>
>
> The test input file is around 7GB file on the HDFS cluster and the
> replication number is 3. (This means that each data node has a copy of every
> block of the file)
>
>
>
> The mapper did nothing and didn't write out any records:
>
>
>
> public static class KeyMapper
>
>        extends Mapper<Object, Text, Text, Text>{
>
>     public void map(Object key, Text value, Context context
>
>                     ) throws IOException, InterruptedException {
>
>
>
>     }
>
>   }
>
>
>
> So this mapper logically is reading and iterating through its splits of data
> and then finish the job.
>
> I didn't see any factors in the above configurations that will cause the
> above phenomenon.
>
> I turned on the debug log for each mapper task and it also showed that all
> the mapper's DFSClient read data from its local data node.
>
>
>
> Can any experts help give some hints for this? I attached the log and client
> code for analysis.
>
>
>
> Thanks,
>
> Haifeng

RE: Some mappers are much slower than others in reading data from HDFS

Posted by "Chen, Haifeng" <ha...@intel.com>.

Thank for your response, Andy.

I think node2 has more mappers than others because the mappers on node2 move on faster and finish earlier than the mappers on node3 and node4. When the first wave of mappers on node2 (00018, 00019, ...) finished and there are more mappers to run, then the resource manager allocated another wave of the mappers to node2 (00025, 00026, ...). And still, the mappers on node2 are the ones runs fastest.

Is it possible that it is because of cache distribution(disk cache?) Does data node hold any cache of recently accessed data?

Regards,
Haifeng


-----Original Message-----
From: Andy Isaacson [mailto:adi@cloudera.com] 
Sent: Wednesday, January 09, 2013 5:46 AM
To: user@hadoop.apache.org
Subject: Re: Some mappers are much slower than others in reading data from HDFS

Your output shows that node2 has 13 mappers and the reducer, while
node3 and node4 had only 8 mappers each.  So I'd expect some
disparity. Since it's hard to correlate the mapper throughput against
the reducer throughput, it's possible that node3 got just as much work
done.

That doesn't explain why node4 is slower than node3, though.

-andy

On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng <ha...@intel.com> wrote:
> Dear sir,
>
> I encountered a strange problem that all the mappers on some nodes are much
> slower than the mappers on other nodes as following some times (not always).
> I didn't see any reasons why they should slow down in this pattern.
>
>
>
> 000013(MAP on node4): --------(8.115)
>
> 000014(MAP on node4): --------(8.570)
>
> 000011(MAP on node4): --------(8.5)
>
> 000016(MAP on node4): --------(8.344)
>
> 000010(MAP on node4): --------(8.585)
>
> 000015(MAP on node4): --------(8.179)
>
> 000017(MAP on node4): --------(8.445)
>
> 000012(MAP on node4): --------(8.312)
>
> 000018(MAP on node2):  ---(3.367)
>
> 000020(MAP on node2):  ---(3.335)
>
> 000019(MAP on node2):  ---(3.320)
>
> 000023(MAP on node2):  ---(3.91)
>
> 000022(MAP on node2):  ---(3.371)
>
> 000021(MAP on node2):  ---(3.458)
>
> 000004(MAP on node3):   -------------------(19.624)
>
> 000007(MAP on node3):   -------------------(19.92)
>
> 000005(MAP on node3):   --------------------(20.613)
>
> 000008(MAP on node3):   --------------------(20.316)
>
> 000003(MAP on node3):   --------------------(20.574)
>
> 000006(MAP on node3):   --------------------(20.654)
>
> 000002(MAP on node3):   -------------------(19.935)
>
> 000009(MAP on node3):   --------------------(20.489)
>
> 000025(MAP on node2):      --(2.877)
>
> 000026(MAP on node2):      ---(3.112)
>
> 000027(MAP on node2):      --(2.959)
>
> 000024(MAP on node2):      --(2.845)
>
> 000029(MAP on node2):       --(2.863)
>
> 000028(MAP on node2):       --(2.933)
>
> 000031(MAP on node2):          --(2.596)
>
> 000030(RED on node2):          -------------(13.378)
>
>
>
> The testing is as following:
>
> I have a 4 nodes cluster and all of them has the same hardware and software
> configurations. One node acts as name node and yarn resource manager.  Other
> three nodes act as both data node and yarn node manager.
>
>
>
> The test input file is around 7GB file on the HDFS cluster and the
> replication number is 3. (This means that each data node has a copy of every
> block of the file)
>
>
>
> The mapper did nothing and didn't write out any records:
>
>
>
> public static class KeyMapper
>
>        extends Mapper<Object, Text, Text, Text>{
>
>     public void map(Object key, Text value, Context context
>
>                     ) throws IOException, InterruptedException {
>
>
>
>     }
>
>   }
>
>
>
> So this mapper logically is reading and iterating through its splits of data
> and then finish the job.
>
> I didn't see any factors in the above configurations that will cause the
> above phenomenon.
>
> I turned on the debug log for each mapper task and it also showed that all
> the mapper's DFSClient read data from its local data node.
>
>
>
> Can any experts help give some hints for this? I attached the log and client
> code for analysis.
>
>
>
> Thanks,
>
> Haifeng

RE: Some mappers are much slower than others in reading data from HDFS

Posted by "Chen, Haifeng" <ha...@intel.com>.

Thank for your response, Andy.

I think node2 has more mappers than others because the mappers on node2 move on faster and finish earlier than the mappers on node3 and node4. When the first wave of mappers on node2 (00018, 00019, ...) finished and there are more mappers to run, then the resource manager allocated another wave of the mappers to node2 (00025, 00026, ...). And still, the mappers on node2 are the ones runs fastest.

Is it possible that it is because of cache distribution(disk cache?) Does data node hold any cache of recently accessed data?

Regards,
Haifeng


-----Original Message-----
From: Andy Isaacson [mailto:adi@cloudera.com] 
Sent: Wednesday, January 09, 2013 5:46 AM
To: user@hadoop.apache.org
Subject: Re: Some mappers are much slower than others in reading data from HDFS

Your output shows that node2 has 13 mappers and the reducer, while
node3 and node4 had only 8 mappers each.  So I'd expect some
disparity. Since it's hard to correlate the mapper throughput against
the reducer throughput, it's possible that node3 got just as much work
done.

That doesn't explain why node4 is slower than node3, though.

-andy

On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng <ha...@intel.com> wrote:
> Dear sir,
>
> I encountered a strange problem that all the mappers on some nodes are much
> slower than the mappers on other nodes as following some times (not always).
> I didn't see any reasons why they should slow down in this pattern.
>
>
>
> 000013(MAP on node4): --------(8.115)
>
> 000014(MAP on node4): --------(8.570)
>
> 000011(MAP on node4): --------(8.5)
>
> 000016(MAP on node4): --------(8.344)
>
> 000010(MAP on node4): --------(8.585)
>
> 000015(MAP on node4): --------(8.179)
>
> 000017(MAP on node4): --------(8.445)
>
> 000012(MAP on node4): --------(8.312)
>
> 000018(MAP on node2):  ---(3.367)
>
> 000020(MAP on node2):  ---(3.335)
>
> 000019(MAP on node2):  ---(3.320)
>
> 000023(MAP on node2):  ---(3.91)
>
> 000022(MAP on node2):  ---(3.371)
>
> 000021(MAP on node2):  ---(3.458)
>
> 000004(MAP on node3):   -------------------(19.624)
>
> 000007(MAP on node3):   -------------------(19.92)
>
> 000005(MAP on node3):   --------------------(20.613)
>
> 000008(MAP on node3):   --------------------(20.316)
>
> 000003(MAP on node3):   --------------------(20.574)
>
> 000006(MAP on node3):   --------------------(20.654)
>
> 000002(MAP on node3):   -------------------(19.935)
>
> 000009(MAP on node3):   --------------------(20.489)
>
> 000025(MAP on node2):      --(2.877)
>
> 000026(MAP on node2):      ---(3.112)
>
> 000027(MAP on node2):      --(2.959)
>
> 000024(MAP on node2):      --(2.845)
>
> 000029(MAP on node2):       --(2.863)
>
> 000028(MAP on node2):       --(2.933)
>
> 000031(MAP on node2):          --(2.596)
>
> 000030(RED on node2):          -------------(13.378)
>
>
>
> The testing is as following:
>
> I have a 4 nodes cluster and all of them has the same hardware and software
> configurations. One node acts as name node and yarn resource manager.  Other
> three nodes act as both data node and yarn node manager.
>
>
>
> The test input file is around 7GB file on the HDFS cluster and the
> replication number is 3. (This means that each data node has a copy of every
> block of the file)
>
>
>
> The mapper did nothing and didn't write out any records:
>
>
>
> public static class KeyMapper
>
>        extends Mapper<Object, Text, Text, Text>{
>
>     public void map(Object key, Text value, Context context
>
>                     ) throws IOException, InterruptedException {
>
>
>
>     }
>
>   }
>
>
>
> So this mapper logically is reading and iterating through its splits of data
> and then finish the job.
>
> I didn't see any factors in the above configurations that will cause the
> above phenomenon.
>
> I turned on the debug log for each mapper task and it also showed that all
> the mapper's DFSClient read data from its local data node.
>
>
>
> Can any experts help give some hints for this? I attached the log and client
> code for analysis.
>
>
>
> Thanks,
>
> Haifeng

RE: Some mappers are much slower than others in reading data from HDFS

Posted by "Chen, Haifeng" <ha...@intel.com>.

Thank for your response, Andy.

I think node2 has more mappers than others because the mappers on node2 move on faster and finish earlier than the mappers on node3 and node4. When the first wave of mappers on node2 (00018, 00019, ...) finished and there are more mappers to run, then the resource manager allocated another wave of the mappers to node2 (00025, 00026, ...). And still, the mappers on node2 are the ones runs fastest.

Is it possible that it is because of cache distribution(disk cache?) Does data node hold any cache of recently accessed data?

Regards,
Haifeng


-----Original Message-----
From: Andy Isaacson [mailto:adi@cloudera.com] 
Sent: Wednesday, January 09, 2013 5:46 AM
To: user@hadoop.apache.org
Subject: Re: Some mappers are much slower than others in reading data from HDFS

Your output shows that node2 has 13 mappers and the reducer, while
node3 and node4 had only 8 mappers each.  So I'd expect some
disparity. Since it's hard to correlate the mapper throughput against
the reducer throughput, it's possible that node3 got just as much work
done.

That doesn't explain why node4 is slower than node3, though.

-andy

On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng <ha...@intel.com> wrote:
> Dear sir,
>
> I encountered a strange problem that all the mappers on some nodes are much
> slower than the mappers on other nodes as following some times (not always).
> I didn't see any reasons why they should slow down in this pattern.
>
>
>
> 000013(MAP on node4): --------(8.115)
>
> 000014(MAP on node4): --------(8.570)
>
> 000011(MAP on node4): --------(8.5)
>
> 000016(MAP on node4): --------(8.344)
>
> 000010(MAP on node4): --------(8.585)
>
> 000015(MAP on node4): --------(8.179)
>
> 000017(MAP on node4): --------(8.445)
>
> 000012(MAP on node4): --------(8.312)
>
> 000018(MAP on node2):  ---(3.367)
>
> 000020(MAP on node2):  ---(3.335)
>
> 000019(MAP on node2):  ---(3.320)
>
> 000023(MAP on node2):  ---(3.91)
>
> 000022(MAP on node2):  ---(3.371)
>
> 000021(MAP on node2):  ---(3.458)
>
> 000004(MAP on node3):   -------------------(19.624)
>
> 000007(MAP on node3):   -------------------(19.92)
>
> 000005(MAP on node3):   --------------------(20.613)
>
> 000008(MAP on node3):   --------------------(20.316)
>
> 000003(MAP on node3):   --------------------(20.574)
>
> 000006(MAP on node3):   --------------------(20.654)
>
> 000002(MAP on node3):   -------------------(19.935)
>
> 000009(MAP on node3):   --------------------(20.489)
>
> 000025(MAP on node2):      --(2.877)
>
> 000026(MAP on node2):      ---(3.112)
>
> 000027(MAP on node2):      --(2.959)
>
> 000024(MAP on node2):      --(2.845)
>
> 000029(MAP on node2):       --(2.863)
>
> 000028(MAP on node2):       --(2.933)
>
> 000031(MAP on node2):          --(2.596)
>
> 000030(RED on node2):          -------------(13.378)
>
>
>
> The testing is as following:
>
> I have a 4 nodes cluster and all of them has the same hardware and software
> configurations. One node acts as name node and yarn resource manager.  Other
> three nodes act as both data node and yarn node manager.
>
>
>
> The test input file is around 7GB file on the HDFS cluster and the
> replication number is 3. (This means that each data node has a copy of every
> block of the file)
>
>
>
> The mapper did nothing and didn't write out any records:
>
>
>
> public static class KeyMapper
>
>        extends Mapper<Object, Text, Text, Text>{
>
>     public void map(Object key, Text value, Context context
>
>                     ) throws IOException, InterruptedException {
>
>
>
>     }
>
>   }
>
>
>
> So this mapper logically is reading and iterating through its splits of data
> and then finish the job.
>
> I didn't see any factors in the above configurations that will cause the
> above phenomenon.
>
> I turned on the debug log for each mapper task and it also showed that all
> the mapper's DFSClient read data from its local data node.
>
>
>
> Can any experts help give some hints for this? I attached the log and client
> code for analysis.
>
>
>
> Thanks,
>
> Haifeng

Re: Some mappers are much slower than others in reading data from HDFS

Posted by Andy Isaacson <ad...@cloudera.com>.

Your output shows that node2 has 13 mappers and the reducer, while
node3 and node4 had only 8 mappers each.  So I'd expect some
disparity. Since it's hard to correlate the mapper throughput against
the reducer throughput, it's possible that node3 got just as much work
done.

That doesn't explain why node4 is slower than node3, though.

-andy

On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng <ha...@intel.com> wrote:
> Dear sir,
>
> I encountered a strange problem that all the mappers on some nodes are much
> slower than the mappers on other nodes as following some times (not always).
> I didn’t see any reasons why they should slow down in this pattern.
>
>
>
> 000013(MAP on node4): --------(8.115)
>
> 000014(MAP on node4): --------(8.570)
>
> 000011(MAP on node4): --------(8.5)
>
> 000016(MAP on node4): --------(8.344)
>
> 000010(MAP on node4): --------(8.585)
>
> 000015(MAP on node4): --------(8.179)
>
> 000017(MAP on node4): --------(8.445)
>
> 000012(MAP on node4): --------(8.312)
>
> 000018(MAP on node2):  ---(3.367)
>
> 000020(MAP on node2):  ---(3.335)
>
> 000019(MAP on node2):  ---(3.320)
>
> 000023(MAP on node2):  ---(3.91)
>
> 000022(MAP on node2):  ---(3.371)
>
> 000021(MAP on node2):  ---(3.458)
>
> 000004(MAP on node3):   -------------------(19.624)
>
> 000007(MAP on node3):   -------------------(19.92)
>
> 000005(MAP on node3):   --------------------(20.613)
>
> 000008(MAP on node3):   --------------------(20.316)
>
> 000003(MAP on node3):   --------------------(20.574)
>
> 000006(MAP on node3):   --------------------(20.654)
>
> 000002(MAP on node3):   -------------------(19.935)
>
> 000009(MAP on node3):   --------------------(20.489)
>
> 000025(MAP on node2):      --(2.877)
>
> 000026(MAP on node2):      ---(3.112)
>
> 000027(MAP on node2):      --(2.959)
>
> 000024(MAP on node2):      --(2.845)
>
> 000029(MAP on node2):       --(2.863)
>
> 000028(MAP on node2):       --(2.933)
>
> 000031(MAP on node2):          --(2.596)
>
> 000030(RED on node2):          -------------(13.378)
>
>
>
> The testing is as following:
>
> I have a 4 nodes cluster and all of them has the same hardware and software
> configurations. One node acts as name node and yarn resource manager.  Other
> three nodes act as both data node and yarn node manager.
>
>
>
> The test input file is around 7GB file on the HDFS cluster and the
> replication number is 3. (This means that each data node has a copy of every
> block of the file)
>
>
>
> The mapper did nothing and didn’t write out any records:
>
>
>
> public static class KeyMapper
>
>        extends Mapper<Object, Text, Text, Text>{
>
>     public void map(Object key, Text value, Context context
>
>                     ) throws IOException, InterruptedException {
>
>
>
>     }
>
>   }
>
>
>
> So this mapper logically is reading and iterating through its splits of data
> and then finish the job.
>
> I didn’t see any factors in the above configurations that will cause the
> above phenomenon.
>
> I turned on the debug log for each mapper task and it also showed that all
> the mapper’s DFSClient read data from its local data node.
>
>
>
> Can any experts help give some hints for this? I attached the log and client
> code for analysis.
>
>
>
> Thanks,
>
> Haifeng

Re: Some mappers are much slower than others in reading data from HDFS

Posted by Andy Isaacson <ad...@cloudera.com>.

Your output shows that node2 has 13 mappers and the reducer, while
node3 and node4 had only 8 mappers each.  So I'd expect some
disparity. Since it's hard to correlate the mapper throughput against
the reducer throughput, it's possible that node3 got just as much work
done.

That doesn't explain why node4 is slower than node3, though.

-andy

On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng <ha...@intel.com> wrote:
> Dear sir,
>
> I encountered a strange problem that all the mappers on some nodes are much
> slower than the mappers on other nodes as following some times (not always).
> I didn’t see any reasons why they should slow down in this pattern.
>
>
>
> 000013(MAP on node4): --------(8.115)
>
> 000014(MAP on node4): --------(8.570)
>
> 000011(MAP on node4): --------(8.5)
>
> 000016(MAP on node4): --------(8.344)
>
> 000010(MAP on node4): --------(8.585)
>
> 000015(MAP on node4): --------(8.179)
>
> 000017(MAP on node4): --------(8.445)
>
> 000012(MAP on node4): --------(8.312)
>
> 000018(MAP on node2):  ---(3.367)
>
> 000020(MAP on node2):  ---(3.335)
>
> 000019(MAP on node2):  ---(3.320)
>
> 000023(MAP on node2):  ---(3.91)
>
> 000022(MAP on node2):  ---(3.371)
>
> 000021(MAP on node2):  ---(3.458)
>
> 000004(MAP on node3):   -------------------(19.624)
>
> 000007(MAP on node3):   -------------------(19.92)
>
> 000005(MAP on node3):   --------------------(20.613)
>
> 000008(MAP on node3):   --------------------(20.316)
>
> 000003(MAP on node3):   --------------------(20.574)
>
> 000006(MAP on node3):   --------------------(20.654)
>
> 000002(MAP on node3):   -------------------(19.935)
>
> 000009(MAP on node3):   --------------------(20.489)
>
> 000025(MAP on node2):      --(2.877)
>
> 000026(MAP on node2):      ---(3.112)
>
> 000027(MAP on node2):      --(2.959)
>
> 000024(MAP on node2):      --(2.845)
>
> 000029(MAP on node2):       --(2.863)
>
> 000028(MAP on node2):       --(2.933)
>
> 000031(MAP on node2):          --(2.596)
>
> 000030(RED on node2):          -------------(13.378)
>
>
>
> The testing is as following:
>
> I have a 4 nodes cluster and all of them has the same hardware and software
> configurations. One node acts as name node and yarn resource manager.  Other
> three nodes act as both data node and yarn node manager.
>
>
>
> The test input file is around 7GB file on the HDFS cluster and the
> replication number is 3. (This means that each data node has a copy of every
> block of the file)
>
>
>
> The mapper did nothing and didn’t write out any records:
>
>
>
> public static class KeyMapper
>
>        extends Mapper<Object, Text, Text, Text>{
>
>     public void map(Object key, Text value, Context context
>
>                     ) throws IOException, InterruptedException {
>
>
>
>     }
>
>   }
>
>
>
> So this mapper logically is reading and iterating through its splits of data
> and then finish the job.
>
> I didn’t see any factors in the above configurations that will cause the
> above phenomenon.
>
> I turned on the debug log for each mapper task and it also showed that all
> the mapper’s DFSClient read data from its local data node.
>
>
>
> Can any experts help give some hints for this? I attached the log and client
> code for analysis.
>
>
>
> Thanks,
>
> Haifeng

Re: Some mappers are much slower than others in reading data from HDFS

Posted by Andy Isaacson <ad...@cloudera.com>.

Your output shows that node2 has 13 mappers and the reducer, while
node3 and node4 had only 8 mappers each.  So I'd expect some
disparity. Since it's hard to correlate the mapper throughput against
the reducer throughput, it's possible that node3 got just as much work
done.

That doesn't explain why node4 is slower than node3, though.

-andy

On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng <ha...@intel.com> wrote:
> Dear sir,
>
> I encountered a strange problem that all the mappers on some nodes are much
> slower than the mappers on other nodes as following some times (not always).
> I didn’t see any reasons why they should slow down in this pattern.
>
>
>
> 000013(MAP on node4): --------(8.115)
>
> 000014(MAP on node4): --------(8.570)
>
> 000011(MAP on node4): --------(8.5)
>
> 000016(MAP on node4): --------(8.344)
>
> 000010(MAP on node4): --------(8.585)
>
> 000015(MAP on node4): --------(8.179)
>
> 000017(MAP on node4): --------(8.445)
>
> 000012(MAP on node4): --------(8.312)
>
> 000018(MAP on node2):  ---(3.367)
>
> 000020(MAP on node2):  ---(3.335)
>
> 000019(MAP on node2):  ---(3.320)
>
> 000023(MAP on node2):  ---(3.91)
>
> 000022(MAP on node2):  ---(3.371)
>
> 000021(MAP on node2):  ---(3.458)
>
> 000004(MAP on node3):   -------------------(19.624)
>
> 000007(MAP on node3):   -------------------(19.92)
>
> 000005(MAP on node3):   --------------------(20.613)
>
> 000008(MAP on node3):   --------------------(20.316)
>
> 000003(MAP on node3):   --------------------(20.574)
>
> 000006(MAP on node3):   --------------------(20.654)
>
> 000002(MAP on node3):   -------------------(19.935)
>
> 000009(MAP on node3):   --------------------(20.489)
>
> 000025(MAP on node2):      --(2.877)
>
> 000026(MAP on node2):      ---(3.112)
>
> 000027(MAP on node2):      --(2.959)
>
> 000024(MAP on node2):      --(2.845)
>
> 000029(MAP on node2):       --(2.863)
>
> 000028(MAP on node2):       --(2.933)
>
> 000031(MAP on node2):          --(2.596)
>
> 000030(RED on node2):          -------------(13.378)
>
>
>
> The testing is as following:
>
> I have a 4 nodes cluster and all of them has the same hardware and software
> configurations. One node acts as name node and yarn resource manager.  Other
> three nodes act as both data node and yarn node manager.
>
>
>
> The test input file is around 7GB file on the HDFS cluster and the
> replication number is 3. (This means that each data node has a copy of every
> block of the file)
>
>
>
> The mapper did nothing and didn’t write out any records:
>
>
>
> public static class KeyMapper
>
>        extends Mapper<Object, Text, Text, Text>{
>
>     public void map(Object key, Text value, Context context
>
>                     ) throws IOException, InterruptedException {
>
>
>
>     }
>
>   }
>
>
>
> So this mapper logically is reading and iterating through its splits of data
> and then finish the job.
>
> I didn’t see any factors in the above configurations that will cause the
> above phenomenon.
>
> I turned on the debug log for each mapper task and it also showed that all
> the mapper’s DFSClient read data from its local data node.
>
>
>
> Can any experts help give some hints for this? I attached the log and client
> code for analysis.
>
>
>
> Thanks,
>
> Haifeng

Re: Some mappers are much slower than others in reading data from HDFS

Posted by Andy Isaacson <ad...@cloudera.com>.

Your output shows that node2 has 13 mappers and the reducer, while
node3 and node4 had only 8 mappers each.  So I'd expect some
disparity. Since it's hard to correlate the mapper throughput against
the reducer throughput, it's possible that node3 got just as much work
done.

That doesn't explain why node4 is slower than node3, though.

-andy

On Mon, Jan 7, 2013 at 7:07 PM, Chen, Haifeng <ha...@intel.com> wrote:
> Dear sir,
>
> I encountered a strange problem that all the mappers on some nodes are much
> slower than the mappers on other nodes as following some times (not always).
> I didn’t see any reasons why they should slow down in this pattern.
>
>
>
> 000013(MAP on node4): --------(8.115)
>
> 000014(MAP on node4): --------(8.570)
>
> 000011(MAP on node4): --------(8.5)
>
> 000016(MAP on node4): --------(8.344)
>
> 000010(MAP on node4): --------(8.585)
>
> 000015(MAP on node4): --------(8.179)
>
> 000017(MAP on node4): --------(8.445)
>
> 000012(MAP on node4): --------(8.312)
>
> 000018(MAP on node2):  ---(3.367)
>
> 000020(MAP on node2):  ---(3.335)
>
> 000019(MAP on node2):  ---(3.320)
>
> 000023(MAP on node2):  ---(3.91)
>
> 000022(MAP on node2):  ---(3.371)
>
> 000021(MAP on node2):  ---(3.458)
>
> 000004(MAP on node3):   -------------------(19.624)
>
> 000007(MAP on node3):   -------------------(19.92)
>
> 000005(MAP on node3):   --------------------(20.613)
>
> 000008(MAP on node3):   --------------------(20.316)
>
> 000003(MAP on node3):   --------------------(20.574)
>
> 000006(MAP on node3):   --------------------(20.654)
>
> 000002(MAP on node3):   -------------------(19.935)
>
> 000009(MAP on node3):   --------------------(20.489)
>
> 000025(MAP on node2):      --(2.877)
>
> 000026(MAP on node2):      ---(3.112)
>
> 000027(MAP on node2):      --(2.959)
>
> 000024(MAP on node2):      --(2.845)
>
> 000029(MAP on node2):       --(2.863)
>
> 000028(MAP on node2):       --(2.933)
>
> 000031(MAP on node2):          --(2.596)
>
> 000030(RED on node2):          -------------(13.378)
>
>
>
> The testing is as following:
>
> I have a 4 nodes cluster and all of them has the same hardware and software
> configurations. One node acts as name node and yarn resource manager.  Other
> three nodes act as both data node and yarn node manager.
>
>
>
> The test input file is around 7GB file on the HDFS cluster and the
> replication number is 3. (This means that each data node has a copy of every
> block of the file)
>
>
>
> The mapper did nothing and didn’t write out any records:
>
>
>
> public static class KeyMapper
>
>        extends Mapper<Object, Text, Text, Text>{
>
>     public void map(Object key, Text value, Context context
>
>                     ) throws IOException, InterruptedException {
>
>
>
>     }
>
>   }
>
>
>
> So this mapper logically is reading and iterating through its splits of data
> and then finish the job.
>
> I didn’t see any factors in the above configurations that will cause the
> above phenomenon.
>
> I turned on the debug log for each mapper task and it also showed that all
> the mapper’s DFSClient read data from its local data node.
>
>
>
> Can any experts help give some hints for this? I attached the log and client
> code for analysis.
>
>
>
> Thanks,
>
> Haifeng