You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Moore, Lee C" <Le...@xerox.com> on 2007/11/19 21:41:01 UTC
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg09096.html
Hello:
I am trying do recrawling with Nutch-0.9. I have done some Google
searching but I haven't an answer that works.
I had hopes for the script located at:
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg09096.html
I tried this script for re-crawling and it has the same problem after a
couple of re-crawls:
----- Merge Indexes (Step 7 of 8) -----
merging indexes to: crawl/index
Adding crawl/NEWindexes/part-00000
IndexMerger: java.io.IOException: Target
crawl/index/merge-output already exists
(also, this script has a un-related bug as it references the variable
$rank but $rank is not defined. I guess this is supposed to be topN.)
Has anybody found the solution to sucessfully re-crawling with 0.9?
thanks,
-Lee
RE: http://www.mail-archive.com/nutch-user@lucene.apache.org/msg09096.html
Posted by "Moore, Lee C" <Le...@xerox.com>.
Hi Susam Pal,
Thanks for the pointer to the latest crawl/recrawl script. It has worked
very well with Nutch-0.9. It is the answer to my problem!
Thanks again!
-Lee
-----Original Message-----
From: Susam Pal [mailto:susam.pal@gmail.com]
Sent: Tuesday, November 20, 2007 1:39 AM
To: nutch-user@lucene.apache.org
Subject: Re:
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg09096.html
Hi Lee,
Thanks for the feedback. The script posted in the mailing list has
some bugs. Please use the latest script from
http://wiki.apache.org/nutch/Crawl
I have also made some minor changes to make it work with Nutch 1.0-dev
in trunk. I have tested this with Nutch 1.0-dev. I believe this should
work fine for Nutch 0.9 too.
We had a discussion on re-crawling for Nutch 1.0-dev here:-
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg09514.html
Please try this script for re-crawling with Nutch-0.9 and let us know
how it goes.
Regards,
Susam Pal
On Nov 20, 2007 2:11 AM, Moore, Lee C <Le...@xerox.com> wrote:
>
>
> Hello:
>
> I am trying do recrawling with Nutch-0.9. I have done some Google
searching
> but I haven't an answer that works.
>
> I had hopes for the script located at:
>
>
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg09096.html
>
> I tried this script for re-crawling and it has the same problem after
a
> couple of re-crawls:
>
> ----- Merge Indexes (Step 7 of 8) -----
> merging indexes to: crawl/index
> Adding crawl/NEWindexes/part-00000
> IndexMerger: java.io.IOException: Target crawl/index/merge-output
already
> exists
> (also, this script has a un-related bug as it references the variable
$rank
> but $rank is not defined. I guess this is supposed to be topN.)
>
> Has anybody found the solution to sucessfully re-crawling with 0.9?
>
> thanks,
>
> -Lee
>
Re: http://www.mail-archive.com/nutch-user@lucene.apache.org/msg09096.html
Posted by Susam Pal <su...@gmail.com>.
Hi Lee,
Thanks for the feedback. The script posted in the mailing list has
some bugs. Please use the latest script from
http://wiki.apache.org/nutch/Crawl
I have also made some minor changes to make it work with Nutch 1.0-dev
in trunk. I have tested this with Nutch 1.0-dev. I believe this should
work fine for Nutch 0.9 too.
We had a discussion on re-crawling for Nutch 1.0-dev here:-
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg09514.html
Please try this script for re-crawling with Nutch-0.9 and let us know
how it goes.
Regards,
Susam Pal
On Nov 20, 2007 2:11 AM, Moore, Lee C <Le...@xerox.com> wrote:
>
>
> Hello:
>
> I am trying do recrawling with Nutch-0.9. I have done some Google searching
> but I haven't an answer that works.
>
> I had hopes for the script located at:
>
> http://www.mail-archive.com/nutch-user@lucene.apache.org/msg09096.html
>
> I tried this script for re-crawling and it has the same problem after a
> couple of re-crawls:
>
> ----- Merge Indexes (Step 7 of 8) -----
> merging indexes to: crawl/index
> Adding crawl/NEWindexes/part-00000
> IndexMerger: java.io.IOException: Target crawl/index/merge-output already
> exists
> (also, this script has a un-related bug as it references the variable $rank
> but $rank is not defined. I guess this is supposed to be topN.)
>
> Has anybody found the solution to sucessfully re-crawling with 0.9?
>
> thanks,
>
> -Lee
>