You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by A Laxmi <a....@gmail.com> on 2014/04/30 19:33:05 UTC

Re: about time for recrawl a url

Hi Feng,

what similar command can I use for  Nutch 2.2.1 with HBase to get Fetch
time and retry interval?

Thanks!


On Fri, Sep 6, 2013 at 10:33 AM, feng lu <am...@gmail.com> wrote:

> Hi Eyeris
>
> use this command
>
> bin/nutch readdb <crawldb> -url <url>
>
> example output like this:
>
> l$ bin/nutch readdb crawldb/ -url
> http://news.163.com/05/0920/16/1U3U7N9P0001121R.html
> URL: http://news.163.com/05/0920/16/1U3U7N9P0001121R.html
> Version: 7
> Status: 1 (db_unfetched)
> Fetch time: Sun Aug 11 00:43:13 CST 2013
> Modified time: Thu Jan 01 08:00:00 CST 1970
> Retries since fetch: 0
> Retry interval: 2592000 seconds (30 days)
> Score: 4.4117645E-5
> Signature: null
> Metadata:
>
> you can see the Fetch time and retry interval, and next fetch time equal to
> fetch time plus retry interval.
>
>
> On Fri, Sep 6, 2013 at 10:26 PM, Eyeris Rodriguez Rueda <erueda@uci.cu
> >wrote:
>
> > Hi all.
> > I want to know about the time for recrawl a url. any idea about the place
> > where i can learn about that?
> > Im using nutch 1.5.1.
> >
> > I know that initially the next fetch time is based on
> > db.fetch.interval.default property and this time is changing for
> > db.fetch.schedule.adaptive.inc_rate and
> db.fetch.schedule.adaptive.dec_rate
> > but how i can check the next fetch time for one url ? and which commands
> i
> > can use ?.
> > Some help will be appreciated.
> > Thanks.
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>