You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Martin Traverso <mt...@gmail.com> on 2007/10/02 22:54:53 UTC
Slow maps
Hi all,
I just got started running hadoop, and I'm seeing extremely low map
performance.
I'm trying the grep example over about 8.3 GB of data (~23 million lines)
and it's taking more than 3h to complete the map step. During that time,
hadoop consumes two entire CPUs on each of the slave nodes. As a point of
comparison, processing those files with unix grep from the command line
takes just 10 minutes.
My setup is as follows:
Hadoop 0.14.1, r571288
Java 1.5.0_07
5-node cluster (including namenode/jobtracker), each node w/ 4 cpu cores
All nodes connected to a Gigabit switch
I'm using the default hadoop config plus the following overrides:
mapred.map.tasks = 27
mapred.reduce.tasks = 11
mapred.child.java.opts=-Xmx512M
This is the output of 'hadoop fsck':
Status: HEALTHY
Total size: 8934685280 B
Total blocks: 876 (avg. block size 10199412 B)
Total dirs: 1
Total files: 876
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Target replication factor: 3
Real replication factor: 3.0
It takes about 3 minutes to complete one map with the following counters:
Map input records 53,679
Map output records 1,973
Map input bytes 20,118,818
Map output bytes 608,890
Combine input records 1,973
Combine output records 1,971
Would somebody give me a couple of pointers on where to start
troubleshooting this problem?
Thanks in advance!
Martin