You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ali rahmani <al...@yahoo.com> on 2014/05/24 11:13:44 UTC

Recrawling in nutch 2.x

Dear Guys,
we are working on search engine ,and we have to juest version 2.x(due to its ability to connect to HBASE). we tired tens of re-crawling scripts but non of them works. Is there any re-crawling scrips for nutch 2.x. 
We also added "db.fetch.interval.default" to "nutch-site.xml" file but dose not have any positive effects. 
Regards,

Re: Recrawling in nutch 2.x

Posted by Ali rahmani <al...@yahoo.com>.
Hi Talat,
We are trying to monitor more than 3000 news site and we should Re-Crawl their main pages and store new added links.
In this process, we just need to crawl first two depths(Main Page and its links) of each site and we should not have any duplicated URL in our final  crawl result.
Regards, 
Ali




On Monday, May 26, 2014 8:17:26 AM, Talat Uyarer <ta...@uyarer.com> wrote:
 


Hi Ali,

Can you explain us What your exceptation is about recrawling ? Do you want
to set next fetchtime or you want to rerun your crawler ?

Talat
24 May 2014 12:14 tarihinde "Ali rahmani" <al...@yahoo.com> yazdı:


> Dear Guys,
> we are working on search engine ,and we have to juest version 2.x(due to
> its ability to connect to HBASE). we tired tens of re-crawling scripts but
> non of them works. Is there any re-crawling scrips for nutch 2.x.
> We also added "db.fetch.interval.default" to "nutch-site.xml" file but
> dose not have any positive effects.
> Regards,

Re: Recrawling in nutch 2.x

Posted by Talat Uyarer <ta...@uyarer.com>.
Hi Ali,

Can you explain us What your exceptation is about recrawling ? Do you want
to set next fetchtime or you want to rerun your crawler ?

Talat
24 May 2014 12:14 tarihinde "Ali rahmani" <al...@yahoo.com> yazdı:

> Dear Guys,
> we are working on search engine ,and we have to juest version 2.x(due to
> its ability to connect to HBASE). we tired tens of re-crawling scripts but
> non of them works. Is there any re-crawling scrips for nutch 2.x.
> We also added "db.fetch.interval.default" to "nutch-site.xml" file but
> dose not have any positive effects.
> Regards,