You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Doug Cutting <cu...@apache.org> on 2006/03/15 20:40:29 UTC
Re: Question on scalability
Olive g wrote:
> Is hadoop/nutch scalable at all or I can tune some other parameters?
I'm not sure what you're asking. How long does it take to run this on a
single machine? My guess is that it's much longer. So things are
scaling: they're running faster when more hardware is added. In all
cases you're using the same number of machines, but varying parameters
and seeing different performance, as one would expect. For your current
configuration, indexing appears fastest when the number of reduce tasks
equals the number of nodes.
> I already have:
> mapred.map.tasks set to 100
> mapred.job.tracker is not local
> mapred.tasktracker.tasks.maximum is 2.
> and everything else is default.
How are you storing things? Are you using dfs?
Are your nodes single-cpu or dual-cpu? My guess is single-cpu, in which
case you might see more consistent performance with
mapred.tasktracker.tasks.maximum=1.
How many disks do you have per node? If you have multiple drives, then
configuring mapred.local.dir to contain a list of directories, one per
drive, might make things faster.
Doug