You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by al...@aim.com on 2012/03/02 06:19:34 UTC

different fetch interval for each depth urls

Hello,

I need to have different fetch intervals for initial seed urls and  urls extracted from them at depth 1. How this can be achieved. I tried -adddays option in generate command but it seems it cannot be used to solve this issue. 

Thanks in advance.
Alex.

Re: different fetch interval for each depth urls

Posted by Markus Jelsma <ma...@openindex.io>.
 On Fri, 2 Mar 2012 14:32:48 -0500 (EST), alxsss@aim.com wrote:
> I need to make this as a cron job, so cannot do changes manually.
>  My problem is to index newspaper sites, but only new links that are
> added every day and not fetch ones that have already been fetched.
>

 I see. Trunk can generate records restricted by status:
 generate -Dgenerate.restrict.status=<status>

>  Thanks.
>  Alex.
>
> -----Original Message-----
>  From: Markus Jelsma
>  To: user
>  Cc: nutch-user
>  Sent: Thu, Mar 1, 2012 10:30 pm
>  Subject: Re: different fetch interval for each depth urls
>
>  Well, you could set a new default fetch interval in your
> configuration
>  after the first crawl cycle but the depth information is lost if you
>
>  continue crawling so there is no real solution.
>
>  What problem are you trying to solve anyway?
>
>  On Fri, 2 Mar 2012 00:19:34 -0500 (EST), alxsss@aim.com [1] wrote:
>> Hello,
>>
>> I need to have different fetch intervals for initial seed urls and
>> urls extracted from them at depth 1. How this can be achieved. I
>> tried
>> -adddays option in generate command but it seems it cannot be used
> to
>> solve this issue.
>>
>> Thanks in advance.
>> Alex.

-- 
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536600 / 06-50258350

Re: different fetch interval for each depth urls

Posted by al...@aim.com.
 I need to make this as a cron job, so cannot do changes manually. 
My problem is  to index newspaper sites, but only new links that are added every day and not fetch ones that have already been fetched.

Thanks.
Alex.

 

 

-----Original Message-----
From: Markus Jelsma <ma...@openindex.io>
To: user <us...@nutch.apache.org>
Cc: nutch-user <nu...@lucene.apache.org>
Sent: Thu, Mar 1, 2012 10:30 pm
Subject: Re: different fetch interval for each depth urls


 Well, you could set a new default fetch interval in your configuration 
 after the first crawl cycle but the depth information is lost if you 
 continue crawling so there is no real solution.

 What problem are you trying to solve anyway?

 On Fri, 2 Mar 2012 00:19:34 -0500 (EST), alxsss@aim.com wrote:
> Hello,
>
> I need to have different fetch intervals for initial seed urls and
> urls extracted from them at depth 1. How this can be achieved. I 
> tried
> -adddays option in generate command but it seems it cannot be used to
> solve this issue.
>
> Thanks in advance.
> Alex.

-- 
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536600 / 06-50258350

 

Re: different fetch interval for each depth urls

Posted by Markus Jelsma <ma...@openindex.io>.
 Well, you could set a new default fetch interval in your configuration 
 after the first crawl cycle but the depth information is lost if you 
 continue crawling so there is no real solution.

 What problem are you trying to solve anyway?

 On Fri, 2 Mar 2012 00:19:34 -0500 (EST), alxsss@aim.com wrote:
> Hello,
>
> I need to have different fetch intervals for initial seed urls and
> urls extracted from them at depth 1. How this can be achieved. I 
> tried
> -adddays option in generate command but it seems it cannot be used to
> solve this issue.
>
> Thanks in advance.
> Alex.

-- 
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536600 / 06-50258350