You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by remi tassing <ta...@gmail.com> on 2012/01/31 10:22:51 UTC

From Nutch 1.2 to 1.4

Hi,

So I've finally decided to move to Nutch-1.4, it seems a lot faster.

The issue I had with executing versions greater than 1.2 on cygwin is
solved by the tip from Luis, thanks!

Now I have a couple of questions:


   1. Are the segments backward compatible? I tried updatedb but I get
   "skipping invalid segment"
   2. With the same configuration, it seems Nutch-1.4 only fetches the
   injected urls but nothing else. Is it smth else to configure?


[1]:
http://lucene.472066.n3.nabble.com/Problem-running-Nutch-on-Win-7-Cygwin-td3487163.html

From Nutch 1.2 to 1.4

Posted by remi tassing <ta...@gmail.com>.
Hi,

1. Freegen won't keep.the db_fetched and db_unfetched info, right?
2. I think it works. My seed was one URL, the first crawl was a
redirection, second crawling one page, 3rd onwards many pages

Remi

On Tuesday, January 31, 2012, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:
> Hi Remi,
>
>
>>
>>   1. Are the segments backward compatible? I tried updatedb but I get
>>   "skipping invalid segment"
>>
> In all honesty I've not tried this!
> Is it possible to use readseg -dump to get a text file then use freegen to
> generate new segments to fetch???
>
>
>>   2. With the same configuration, it seems Nutch-1.4 only fetches the
>>   injected urls but nothing else. Is it smth else to configure?
>>
>> Can you be more specific please? I'm not sure what you mean, can you
> provide some log output or data relating to injected and fetched urls
> within the crawldb?
>
>
>
> --
> *Lewis*
>

Re: From Nutch 1.2 to 1.4

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Remi,


>
>   1. Are the segments backward compatible? I tried updatedb but I get
>   "skipping invalid segment"
>
In all honesty I've not tried this!
Is it possible to use readseg -dump to get a text file then use freegen to
generate new segments to fetch???


>   2. With the same configuration, it seems Nutch-1.4 only fetches the
>   injected urls but nothing else. Is it smth else to configure?
>
> Can you be more specific please? I'm not sure what you mean, can you
provide some log output or data relating to injected and fetched urls
within the crawldb?



-- 
*Lewis*