You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Richard Zhang <ri...@gmail.com> on 2008/07/10 11:13:14 UTC

!!!Help: Strange difference on the number of maps in HDFS and local file system

Hello Hadoopers:

I am trying to running the same map reduce job on HDFS and local file
system. That is one time, I run the map reduce job on HDFS and another time
I run the same map reduce job with the same input data  on local file ext3
system without using HDFS. I found that the number of maps generated in
local file system is always much larger than the case with HDFS.
This seems strange to me, because the number of maps is decided by the
number of the splits of the given map reduce input. And I input with the
same map reduce job and the same input data, which should be split *into the
same number of pieces and thus the same number of maps should be generated
in both case.
It seems to me that the major difference, if there is, should be that the
HDFS need to copy the input data into HDFS sequence file format. But that
should not have effects on the number of splits of the file.
Why this will happen? Does any one encounter this before? Any insight on
this phenomena?

Thanks.
Richard.
*