You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jjmendes <jj...@student.dei.uc.pt> on 2016/10/21 19:42:05 UTC

Adding a set number of inner pages to the fetch list

In order to get data for a study, I am currently using Nutch to go
through a list of web pages and download their HTML, said list is solely
comprised of main pages. However, it would be beneficial to also
download at least one other page from the same domain that was linked to
by its home page. Is there any easy way of achieving this?

Thanks,

JJAM