You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by devang pandey <de...@gmail.com> on 2013/07/29 12:29:42 UTC

nutch crawldb analytics

Hello,

I am using nutch 1.4 for crawling purposes. I have crawled a number of
domains successfully .  My aim is to get  complete details of a domain .
Particularly I want to get list of all Db fetched urls . Db unfetched and
Db gone urls . Please suggest me how to do this . Using crawl db dump we
can do it but it will contain information about all domains not a
particular one .

RE: nutch crawldb analytics

Posted by Markus Jelsma <ma...@openindex.io>.
Patch should work with any 1.x, it doesn't change existing sources and only reads the CrawlDB.
 
 
-----Original message-----
> From:devang pandey <de...@gmail.com>
> Sent: Monday 29th July 2013 13:00
> To: user@nutch.apache.org
> Subject: Re: nutch crawldb analytics
> 
> @ Markus Jelsma Thanx for reply .. will HostDb tools work on nutch 1.4 .
> Thing is I have to stick to 1.4 only .How can use this hostdb tools in
> nutch 1.4 . Please guide me .
> 
> 
> On Mon, Jul 29, 2013 at 4:13 PM, Markus Jelsma
> <ma...@openindex.io>wrote:
> 
> > Using the HostDB tools your can create a database of hosts and dump their
> > statistics.
> > https://issues.apache.org/jira/browse/NUTCH-1325
> >
> >
> >
> > -----Original message-----
> > > From:devang pandey <de...@gmail.com>
> > > Sent: Monday 29th July 2013 12:30
> > > To: user@nutch.apache.org
> > > Subject: nutch crawldb analytics
> > >
> > > Hello,
> > >
> > > I am using nutch 1.4 for crawling purposes. I have crawled a number of
> > > domains successfully .  My aim is to get  complete details of a domain .
> > > Particularly I want to get list of all Db fetched urls . Db unfetched and
> > > Db gone urls . Please suggest me how to do this . Using crawl db dump we
> > > can do it but it will contain information about all domains not a
> > > particular one .
> > >
> >
> 

Re: nutch crawldb analytics

Posted by devang pandey <de...@gmail.com>.
@ Markus Jelsma Thanx for reply .. will HostDb tools work on nutch 1.4 .
Thing is I have to stick to 1.4 only .How can use this hostdb tools in
nutch 1.4 . Please guide me .


On Mon, Jul 29, 2013 at 4:13 PM, Markus Jelsma
<ma...@openindex.io>wrote:

> Using the HostDB tools your can create a database of hosts and dump their
> statistics.
> https://issues.apache.org/jira/browse/NUTCH-1325
>
>
>
> -----Original message-----
> > From:devang pandey <de...@gmail.com>
> > Sent: Monday 29th July 2013 12:30
> > To: user@nutch.apache.org
> > Subject: nutch crawldb analytics
> >
> > Hello,
> >
> > I am using nutch 1.4 for crawling purposes. I have crawled a number of
> > domains successfully .  My aim is to get  complete details of a domain .
> > Particularly I want to get list of all Db fetched urls . Db unfetched and
> > Db gone urls . Please suggest me how to do this . Using crawl db dump we
> > can do it but it will contain information about all domains not a
> > particular one .
> >
>

RE: nutch crawldb analytics

Posted by Markus Jelsma <ma...@openindex.io>.
Using the HostDB tools your can create a database of hosts and dump their statistics.
https://issues.apache.org/jira/browse/NUTCH-1325

 
 
-----Original message-----
> From:devang pandey <de...@gmail.com>
> Sent: Monday 29th July 2013 12:30
> To: user@nutch.apache.org
> Subject: nutch crawldb analytics
> 
> Hello,
> 
> I am using nutch 1.4 for crawling purposes. I have crawled a number of
> domains successfully .  My aim is to get  complete details of a domain .
> Particularly I want to get list of all Db fetched urls . Db unfetched and
> Db gone urls . Please suggest me how to do this . Using crawl db dump we
> can do it but it will contain information about all domains not a
> particular one .
>