You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by devang pandey <de...@gmail.com> on 2013/07/29 12:29:42 UTC
nutch crawldb analytics
Hello,
I am using nutch 1.4 for crawling purposes. I have crawled a number of
domains successfully . My aim is to get complete details of a domain .
Particularly I want to get list of all Db fetched urls . Db unfetched and
Db gone urls . Please suggest me how to do this . Using crawl db dump we
can do it but it will contain information about all domains not a
particular one .
RE: nutch crawldb analytics
Posted by Markus Jelsma <ma...@openindex.io>.
Patch should work with any 1.x, it doesn't change existing sources and only reads the CrawlDB.
-----Original message-----
> From:devang pandey <de...@gmail.com>
> Sent: Monday 29th July 2013 13:00
> To: user@nutch.apache.org
> Subject: Re: nutch crawldb analytics
>
> @ Markus Jelsma Thanx for reply .. will HostDb tools work on nutch 1.4 .
> Thing is I have to stick to 1.4 only .How can use this hostdb tools in
> nutch 1.4 . Please guide me .
>
>
> On Mon, Jul 29, 2013 at 4:13 PM, Markus Jelsma
> <ma...@openindex.io>wrote:
>
> > Using the HostDB tools your can create a database of hosts and dump their
> > statistics.
> > https://issues.apache.org/jira/browse/NUTCH-1325
> >
> >
> >
> > -----Original message-----
> > > From:devang pandey <de...@gmail.com>
> > > Sent: Monday 29th July 2013 12:30
> > > To: user@nutch.apache.org
> > > Subject: nutch crawldb analytics
> > >
> > > Hello,
> > >
> > > I am using nutch 1.4 for crawling purposes. I have crawled a number of
> > > domains successfully . My aim is to get complete details of a domain .
> > > Particularly I want to get list of all Db fetched urls . Db unfetched and
> > > Db gone urls . Please suggest me how to do this . Using crawl db dump we
> > > can do it but it will contain information about all domains not a
> > > particular one .
> > >
> >
>
Re: nutch crawldb analytics
Posted by devang pandey <de...@gmail.com>.
@ Markus Jelsma Thanx for reply .. will HostDb tools work on nutch 1.4 .
Thing is I have to stick to 1.4 only .How can use this hostdb tools in
nutch 1.4 . Please guide me .
On Mon, Jul 29, 2013 at 4:13 PM, Markus Jelsma
<ma...@openindex.io>wrote:
> Using the HostDB tools your can create a database of hosts and dump their
> statistics.
> https://issues.apache.org/jira/browse/NUTCH-1325
>
>
>
> -----Original message-----
> > From:devang pandey <de...@gmail.com>
> > Sent: Monday 29th July 2013 12:30
> > To: user@nutch.apache.org
> > Subject: nutch crawldb analytics
> >
> > Hello,
> >
> > I am using nutch 1.4 for crawling purposes. I have crawled a number of
> > domains successfully . My aim is to get complete details of a domain .
> > Particularly I want to get list of all Db fetched urls . Db unfetched and
> > Db gone urls . Please suggest me how to do this . Using crawl db dump we
> > can do it but it will contain information about all domains not a
> > particular one .
> >
>
RE: nutch crawldb analytics
Posted by Markus Jelsma <ma...@openindex.io>.
Using the HostDB tools your can create a database of hosts and dump their statistics.
https://issues.apache.org/jira/browse/NUTCH-1325
-----Original message-----
> From:devang pandey <de...@gmail.com>
> Sent: Monday 29th July 2013 12:30
> To: user@nutch.apache.org
> Subject: nutch crawldb analytics
>
> Hello,
>
> I am using nutch 1.4 for crawling purposes. I have crawled a number of
> domains successfully . My aim is to get complete details of a domain .
> Particularly I want to get list of all Db fetched urls . Db unfetched and
> Db gone urls . Please suggest me how to do this . Using crawl db dump we
> can do it but it will contain information about all domains not a
> particular one .
>