You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by xingjian <xi...@gmail.com> on 2007/11/16 06:00:08 UTC

about heritrix crawl,Who will tell me in this Nutch forum?thanks

A.3. Mirroring .html Files Only in
http://crawler.archive.org/articles/user_manual/usecases.html

......
On the Setting screen, i'll want to set the following for the
NotMatchesFilePatternDecideRule:

decision: REJECT
use-preset-pattern: CUSTOM
regexp: .*(/|\.html)$


......

How to config above in Submodules of Heritrix ?I do't know.anyone help
me.Thanks

-- 
View this message in context: http://www.nabble.com/about-heritrix-crawl%2CWho-will-tell-me-in-this-Nutch-forum-thanks-tf4819146.html#a13787379
Sent from the Nutch - Dev mailing list archive at Nabble.com.