You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Coffey <mc...@yahoo.com.INVALID> on 2016/11/04 02:35:25 UTC

db.ignore.external.links

Does db.ignore.external.links accept only relative urls? I am crawling a site, let's call it http://www.xyz.com. It contains things like <A HREF="http://www.xyz.com/business.html" >.


Those urls don't end up in the crawldb, but ones with relative urls do. Is this normal, or am I confused?