You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by driki <gi...@git.apache.org> on 2012/09/08 21:35:00 UTC
nutch pull request: redirects treated as external links
GitHub user driki opened a pull request:
https://github.com/apache/nutch/pull/1
redirects treated as external links
Hi,
I encountered an issue with the crawler adhering to the db.ignore.external.links property when encountering a link on the same domain that contains a redirect to an external domain. Tested locally against a few sites that I crawl and appears to be working.
Thanks,
Matt
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/NearbyFYI/nutch 2.x
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/nutch/pull/1.patch
----
commit a71133d257aa5c7835b9cc98134b1e7b3df5b5fe
Author: Matt MacDonald <ma...@gmail.com>
Date: 2012-09-08T12:25:28-07:00
redirects treated as external links
----