You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Talat UYARER <ta...@agmlab.com> on 2013/07/25 08:40:29 UTC
Duplicate Fetches for Fetch Job
Hi,
We are using nutch for high volume crawls. We noticed that FetcherJob
ReduceTask fetches some websites multiple times for long lasting queues.
I have discovered the reason of this is
mapred.reduce.tasks.speculative.execution settings in hadoop. This comes
true as default. I suggest this value should be false for FetcherJob.
What do you think?
Talat
Re: Duplicate Fetches for Fetch Job
Posted by Talat UYARER <ta...@agmlab.com>.
Thanks Tejas. I had some hesitation at first; I will go on and open an
issue and upload patch.
25-07-2013 10:12 tarihinde, Tejas Patil yazdı:
> 1.x has speculative execution turned off:
> Fetcher.java:1328: job.setSpeculativeExecution(false);
>
> but 2.x doesn't. It makes sense to do that. I don't see any good reason to
> not have it in 2.x. Could you open a jira for this and upload a patch ?
>
>
> On Wed, Jul 24, 2013 at 11:40 PM, Talat UYARER <ta...@agmlab.com>wrote:
>
>> Hi,
>>
>> We are using nutch for high volume crawls. We noticed that FetcherJob
>> ReduceTask fetches some websites multiple times for long lasting queues. I
>> have discovered the reason of this is mapred.reduce.tasks.**speculative.execution
>> settings in hadoop. This comes true as default. I suggest this value should
>> be false for FetcherJob. What do you think?
>>
>> Talat
>>
Re: Duplicate Fetches for Fetch Job
Posted by Tejas Patil <te...@gmail.com>.
1.x has speculative execution turned off:
Fetcher.java:1328: job.setSpeculativeExecution(false);
but 2.x doesn't. It makes sense to do that. I don't see any good reason to
not have it in 2.x. Could you open a jira for this and upload a patch ?
On Wed, Jul 24, 2013 at 11:40 PM, Talat UYARER <ta...@agmlab.com>wrote:
> Hi,
>
> We are using nutch for high volume crawls. We noticed that FetcherJob
> ReduceTask fetches some websites multiple times for long lasting queues. I
> have discovered the reason of this is mapred.reduce.tasks.**speculative.execution
> settings in hadoop. This comes true as default. I suggest this value should
> be false for FetcherJob. What do you think?
>
> Talat
>