You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by derevo <da...@inbox.ru> on 2007/05/10 13:14:37 UTC
fetch single host
hi,
(2 servers hadoop nutch)
I am try to fetch my host with txt files ( http://site.net/file_1.txt ).
More then 150000 txt files.
when i start fetch and look in access.log file in target host, i see only
one slave host do fetch (SLAVE_1).
I try to restart fetching and slave host now is (SLAVE_2).
in Task Tracker Status i see the same result
Why they do not work together?
--
View this message in context: http://www.nabble.com/fetch-single-host-tf3721037.html#a10411385
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: fetch single host
Posted by Sami Siren <ss...@gmail.com>.
derevo wrote:
> hi,
> (2 servers hadoop nutch)
>
> I am try to fetch my host with txt files ( http://site.net/file_1.txt ).
> More then 150000 txt files.
> when i start fetch and look in access.log file in target host, i see only
> one slave host do fetch (SLAVE_1).
> I try to restart fetching and slave host now is (SLAVE_2).
>
> in Task Tracker Status i see the same result
Fetchlist is by default partitioned in a way that all urls for same host
will end up being fetched by a single node see PartitionUrlByHost.
To override this you would need to change the partitioner or stop using
it (both would require source code changes)
--
Sami Siren