You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ir <ir...@gmail.com> on 2005/06/01 02:00:45 UTC

Do I understand fetch right?

I have a urlfile with 1 site.  I inject it into the db and then do a
fetch and it fetches 1 page.

I insert it into the db (65 entries inserted, i'm guessing the number
of links on that 1 fetched page), then I generate the segments again.

Do a fetch again and it gets 65 pages.  Insert those into db, generate
segments again and then do another fetch and it gets 2000+ pages....

So as I understand it each time you generate/fetch its like going
another level deep?  So running 4 generate/fetches would be the same
as running crawl with -depth 4?

Also 2nd question.  Is there a way from the command line to get the
total number of pages you have indexed?  Thanks