You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by karthik085 <ka...@gmail.com> on 2007/05/09 23:53:09 UTC

Readdb question

Hi,

I crawled a website which had hundreds or thousands of pages. I asked nutch
to get only 54 pages I wanted, which it did. When I put the database for
searching, I typed the host name and it said there are 54 pages matching the
same - so it do the right operation.

When i type nutch readdb crawled-database/db/ -stats, I get 
Number of pages: 356
Number of links: 355

What does this mean? Does this mean how many pages and links are there in
the website? In this case, what is the difference between pages and links?
Does a page represent a link?

Thanks,
Karthik
-- 
View this message in context: http://www.nabble.com/Readdb-question-tf3718517.html#a10403539
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Readdb question

Posted by Marcin Okraszewski <ok...@o2.pl>.
http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html#webdb


On 5/9/07, karthik085 <ka...@gmail.com> wrote:
>
> Hi,
>
> I crawled a website which had hundreds or thousands of pages. I asked nutch
> to get only 54 pages I wanted, which it did. When I put the database for
> searching, I typed the host name and it said there are 54 pages matching the
> same - so it do the right operation.
>
> When i type nutch readdb crawled-database/db/ -stats, I get
> Number of pages: 356
> Number of links: 355
>
> What does this mean? Does this mean how many pages and links are there in
> the website? In this case, what is the difference between pages and links?
> Does a page represent a link?
>
> Thanks,
> Karthik
> --
> View this message in context: http://www.nabble.com/Readdb-question-tf3718517.html#a10403539
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>

Re: Readdb question

Posted by Marcin Okraszewski <ok...@gmail.com>.
http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html#webdb


On 5/9/07, karthik085 <ka...@gmail.com> wrote:
>
> Hi,
>
> I crawled a website which had hundreds or thousands of pages. I asked nutch
> to get only 54 pages I wanted, which it did. When I put the database for
> searching, I typed the host name and it said there are 54 pages matching the
> same - so it do the right operation.
>
> When i type nutch readdb crawled-database/db/ -stats, I get
> Number of pages: 356
> Number of links: 355
>
> What does this mean? Does this mean how many pages and links are there in
> the website? In this case, what is the difference between pages and links?
> Does a page represent a link?
>
> Thanks,
> Karthik
> --
> View this message in context: http://www.nabble.com/Readdb-question-tf3718517.html#a10403539
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>