You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "emmanuel.csantana" <em...@gmail.com> on 2010/08/24 18:07:48 UTC

Re: Staying in Domain

"... don't you achieve the same
functionality using the db.ignore.external.links property in
nutch-default.xml?"

I have a similar doubt.
using db.ignore.external.links won't keep it from reaching external domains
that it can get 
from a redirection.

as extracted from :
http://lucene.472066.n3.nabble.com/db-ignore-external-links-true-and-redirects-td615411.html

"if I start at
http://www.xyz.com and Nutch finds a link pointing to
http://www.xyz.com/blog which is actually a redirection to
http://blog.xyz.com then Nutch will start fetching pages from
http://blog.xyz.com even though it was not in seed url file"

Does this patch solve this ?

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Staying-in-Domain-tp915885p1314022.html
Sent from the Nutch - User mailing list archive at Nabble.com.