You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Kevin MacDonald <ke...@hautesecure.com> on 2008/09/15 18:32:05 UTC

Fetcher vs. Fetcher2

How does one configure nutch to use Fetcher2 instead of Fetcher? I've been
digging through configuration files but have been unable to figure it out.

Thanks!

Kevin

Re: Fetcher vs. Fetcher2

Posted by Kevin MacDonald <ke...@hautesecure.com>.
FYI I did not find that Fetcher2 was useful to me. I am crawling urls to a
depth of 1 only for the purpose of enumerating the links off each url that I
crawl and I'm trying to optimize for that. Fetcher2 seemed to be
significantly slower than Fetcher. It also aborted with all of the threads
hung for reasons unknown.

Kevin

On Mon, Sep 15, 2008 at 11:08 AM, Kevin MacDonald <ke...@hautesecure.com>wrote:

> If you look at the bin/nutch shell script you can see where Fetcher or
> Fetcher2 is used, so it's just a matter of choosing which one you want to
> use. In my case I am not using the shell script. Rather, I have taken most
> of the code from org.apache.nutch.crawl.Crawl.main() and modified it a bit
> to suit my own purposes. In Crawl.main() you can see where Fetcher is used.
> I just swapped it out for Fetcher2. This is all kind of experimental at the
> moment. I am trying to optimize what nutch does by only doing the minimum
> work necessary for my application.
>
> Kevin
>
>
> On Mon, Sep 15, 2008 at 10:40 AM, David Grandinetti <db...@gmail.com>wrote:
>
>> So, what was the solution?
>>
>>
>> On Sep 15, 2008, at 18:22, "Kevin MacDonald" <ke...@hautesecure.com>
>> wrote:
>>
>>  Never mind. I answered my own question. :-)
>>>
>>> On Mon, Sep 15, 2008 at 9:32 AM, Kevin MacDonald <kevin@hautesecure.com
>>> >wrote:
>>>
>>>  How does one configure nutch to use Fetcher2 instead of Fetcher? I've
>>>> been
>>>> digging through configuration files but have been unable to figure it
>>>> out.
>>>>
>>>> Thanks!
>>>>
>>>> Kevin
>>>>
>>>>
>

Re: Fetcher vs. Fetcher2

Posted by Kevin MacDonald <ke...@hautesecure.com>.
If you look at the bin/nutch shell script you can see where Fetcher or
Fetcher2 is used, so it's just a matter of choosing which one you want to
use. In my case I am not using the shell script. Rather, I have taken most
of the code from org.apache.nutch.crawl.Crawl.main() and modified it a bit
to suit my own purposes. In Crawl.main() you can see where Fetcher is used.
I just swapped it out for Fetcher2. This is all kind of experimental at the
moment. I am trying to optimize what nutch does by only doing the minimum
work necessary for my application.

Kevin

On Mon, Sep 15, 2008 at 10:40 AM, David Grandinetti <db...@gmail.com>wrote:

> So, what was the solution?
>
>
> On Sep 15, 2008, at 18:22, "Kevin MacDonald" <ke...@hautesecure.com>
> wrote:
>
>  Never mind. I answered my own question. :-)
>>
>> On Mon, Sep 15, 2008 at 9:32 AM, Kevin MacDonald <kevin@hautesecure.com
>> >wrote:
>>
>>  How does one configure nutch to use Fetcher2 instead of Fetcher? I've
>>> been
>>> digging through configuration files but have been unable to figure it
>>> out.
>>>
>>> Thanks!
>>>
>>> Kevin
>>>
>>>

Re: Fetcher vs. Fetcher2

Posted by David Grandinetti <db...@gmail.com>.
So, what was the solution?

On Sep 15, 2008, at 18:22, "Kevin MacDonald" <ke...@hautesecure.com>  
wrote:

> Never mind. I answered my own question. :-)
>
> On Mon, Sep 15, 2008 at 9:32 AM, Kevin MacDonald <kevin@hautesecure.com 
> >wrote:
>
>> How does one configure nutch to use Fetcher2 instead of Fetcher?  
>> I've been
>> digging through configuration files but have been unable to figure  
>> it out.
>>
>> Thanks!
>>
>> Kevin
>>

Re: Fetcher vs. Fetcher2

Posted by Kevin MacDonald <ke...@hautesecure.com>.
Never mind. I answered my own question. :-)

On Mon, Sep 15, 2008 at 9:32 AM, Kevin MacDonald <ke...@hautesecure.com>wrote:

> How does one configure nutch to use Fetcher2 instead of Fetcher? I've been
> digging through configuration files but have been unable to figure it out.
>
> Thanks!
>
> Kevin
>