You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Viksit Gaur <vi...@gmail.com> on 2008/06/12 08:22:09 UTC

Retrieving data for a particular URL from crawldb?

Hi all,

Is there a way to retrieve a particular page from the nutch crawl using 
the URL as a key? Since I don't know the segment directory which this 
page was put into, I can't use nutch readseg. But that tool only gives 
stats about the URL and not its contents.

Any ideas on the best way to do this?

Thanks,
Viksit