You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by shantanu <sh...@gmail.com> on 2011/06/02 22:43:20 UTC

bypass crawl-urlfilter.txt

Hi,  I am trying to use nutch ti crawl websites one at a time. However, i do
not want to use crawl-urlfilter.txt for filtering urls. Instead, I want to
be able to do that using some class from nutch, but I am not sure which. Can
somebody guide me with this.

Example - I would say crawl http://www.amazon.com and it should not look
into crawl-urlfilter.txt but instead do it through the java program itself

--
View this message in context: http://lucene.472066.n3.nabble.com/bypass-crawl-urlfilter-txt-tp3017143p3017143.html
Sent from the Nutch - User mailing list archive at Nabble.com.