You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by remi tassing <ta...@gmail.com> on 2012/02/14 19:09:20 UTC

From Nutch 1.2 to 1.4

Hi,

1. Freegen won't keep.the db_fetched and db_unfetched info, right?
2. I think it works. My seed was one URL, the first crawl was a
redirection, second crawling one page, 3rd onwards many pages

Remi

On Tuesday, January 31, 2012, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:
> Hi Remi,
>
>
>>
>>   1. Are the segments backward compatible? I tried updatedb but I get
>>   "skipping invalid segment"
>>
> In all honesty I've not tried this!
> Is it possible to use readseg -dump to get a text file then use freegen to
> generate new segments to fetch???
>
>
>>   2. With the same configuration, it seems Nutch-1.4 only fetches the
>>   injected urls but nothing else. Is it smth else to configure?
>>
>> Can you be more specific please? I'm not sure what you mean, can you
> provide some log output or data relating to injected and fetched urls
> within the crawldb?
>
>
>
> --
> *Lewis*
>