You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by remi tassing <ta...@gmail.com> on 2012/01/31 10:22:51 UTC
From Nutch 1.2 to 1.4
Hi,
So I've finally decided to move to Nutch-1.4, it seems a lot faster.
The issue I had with executing versions greater than 1.2 on cygwin is
solved by the tip from Luis, thanks!
Now I have a couple of questions:
1. Are the segments backward compatible? I tried updatedb but I get
"skipping invalid segment"
2. With the same configuration, it seems Nutch-1.4 only fetches the
injected urls but nothing else. Is it smth else to configure?
[1]:
http://lucene.472066.n3.nabble.com/Problem-running-Nutch-on-Win-7-Cygwin-td3487163.html
From Nutch 1.2 to 1.4
Posted by remi tassing <ta...@gmail.com>.
Hi,
1. Freegen won't keep.the db_fetched and db_unfetched info, right?
2. I think it works. My seed was one URL, the first crawl was a
redirection, second crawling one page, 3rd onwards many pages
Remi
On Tuesday, January 31, 2012, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:
> Hi Remi,
>
>
>>
>> 1. Are the segments backward compatible? I tried updatedb but I get
>> "skipping invalid segment"
>>
> In all honesty I've not tried this!
> Is it possible to use readseg -dump to get a text file then use freegen to
> generate new segments to fetch???
>
>
>> 2. With the same configuration, it seems Nutch-1.4 only fetches the
>> injected urls but nothing else. Is it smth else to configure?
>>
>> Can you be more specific please? I'm not sure what you mean, can you
> provide some log output or data relating to injected and fetched urls
> within the crawldb?
>
>
>
> --
> *Lewis*
>
Re: From Nutch 1.2 to 1.4
Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi Remi,
>
> 1. Are the segments backward compatible? I tried updatedb but I get
> "skipping invalid segment"
>
In all honesty I've not tried this!
Is it possible to use readseg -dump to get a text file then use freegen to
generate new segments to fetch???
> 2. With the same configuration, it seems Nutch-1.4 only fetches the
> injected urls but nothing else. Is it smth else to configure?
>
> Can you be more specific please? I'm not sure what you mean, can you
provide some log output or data relating to injected and fetched urls
within the crawldb?
--
*Lewis*