You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by an...@orbita1.ru on 2005/11/21 11:22:56 UTC
mapred.map.tasks
Why we need parameter mapred.map.tasks greater than number of available
host? If we set it equal to number of host, we got "negative progress
percentages" problem.
RE: mapred.map.tasks
Posted by an...@orbita1.ru.
I tried to launch mapred on 2 machines: 192.168.0.250 and 192.168.0.111.
In nutch-site.xml I specified parameters:
1) On the both machines:
<property>
<name>fs.default.name</name>
<value>192.168.0.250:9009</value>
<description>The name of the default file system. Either the
literal string "local" or a host:port for NDFS.</description>
</property>
<property>
<name>mapred.job.tracker</name>
<value>192.168.0.250:9010</value>
<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.
</description>
</property>
<property>
<name>mapred.map.tasks</name>
<value>2</value>
<description>The default number of map tasks per job. Typically set
to a prime several times greater than number of available hosts.
Ignored when mapred.job.tracker is "local".
</description>
</property>
<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>2</value>
<description>The maximum number of tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
<name>mapred.reduce.tasks</name>
<value>2</value>
<description>The default number of reduce tasks per job. Typically set
to a prime close to the number of available hosts. Ignored when
mapred.job.tracker is "local".
</description>
</property>
On 192.168.0.250 I started:
2) bin/nutch-daemon.sh start datanode
3) bin/nutch-daemon.sh start namenode
4) bin/nutch-daemon.sh start jobtracker
5) bin/nutch-daemon.sh start tasktracker
I created directory seeds and file urls in it. Urls contained 2 links.
Then I added that directory to NDFS (bin/nutch ndfs -put ./seeds seeds).
Directory was added successfully..
Then I launched command:
bin/nutch crawl seeds -depth 2
I a result I received log written by jobtracker:
....
051123 053118 Adding task 'task_m_z66npx' to set for tracker 'tracker_53845'
051123 053118 Adding task 'task_m_xaynqo' to set for tracker 'tracker_11518'
051123 053130 Task 'task_m_z66npx' has finished successfully.
Log written by tasktracker on 192.168.0.111:
......
051110 142607 task_m_z66npx 0.0% /user/root/seeds/urls:0+31
051110 142607 task_m_z66npx 1.0% /user/root/seeds/urls:0+31
051110 142607 Task task_m_z66npx is done.
Log written by tasktracker on 192.168.0.250:
....
051123 053125 task_m_xaynqo 0.12903225% /user/root/seeds/urls:31+31
051123 053126 task_m_xaynqo -683.9677% /user/root/seeds/urls:31+31
051123 053127 task_m_xaynqo -2129.9678% /user/root/seeds/urls:31+31
051123 053128 task_m_xaynqo -3483.0322% /user/root/seeds/urls:31+31
051123 053129 task_m_xaynqo -4976.2256% /user/root/seeds/urls:31+31
051123 053130 task_m_xaynqo -6449.1934% /user/root/seeds/urls:31+31
051123 053131 task_m_xaynqo -7898.258% /user/root/seeds/urls:31+31
051123 053132 task_m_xaynqo -9232.193% /user/root/seeds/urls:31+31
051123 053133 task_m_xaynqo -10694.3545% /user/root/seeds/urls:31+31
051123 053134 task_m_xaynqo -12139.226% /user/root/seeds/urls:31+31
051123 053135 task_m_xaynqo -13416.677% /user/root/seeds/urls:31+31
051123 053136 task_m_xaynqo -14885.741% /user/root/seeds/urls:31+31
... and so on... e.g. in this log were records with reducing percents.
I concluded that was an attempt to separate inject to 2 machines e.g.
were 2 tasks: 'task_m_z66npx' and 'task_m_xaynqo'. And 'task_m_z66npx'
was finished successfully and 'task_m_xaynqo' caused some problems (negative
progress).
But if I change parameter mapred.reduce.tasks to 4 all tasks finished
successfully and all work right.
-----Original Message-----
From: Doug Cutting [mailto:cutting@nutch.org]
Sent: Tuesday, November 22, 2005 2:10 AM
To: nutch-dev@lucene.apache.org
Subject: Re: mapred.map.tasks
anton@orbita1.ru wrote:
> Why we need parameter mapred.map.tasks greater than number of available
> host? If we set it equal to number of host, we got "negative progress
> percentages" problem.
Can you please post a simple example that demonstrates the "negative
progress" problem? E.g., the minimal changes to your conf/ directory
required to illustrate this, how you start your daemons, etc.
Thanks,
Doug
Re: mapred.map.tasks
Posted by Doug Cutting <cu...@nutch.org>.
anton@orbita1.ru wrote:
> Why we need parameter mapred.map.tasks greater than number of available
> host? If we set it equal to number of host, we got "negative progress
> percentages" problem.
Can you please post a simple example that demonstrates the "negative
progress" problem? E.g., the minimal changes to your conf/ directory
required to illustrate this, how you start your daemons, etc.
Thanks,
Doug