You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by driki <gi...@git.apache.org> on 2012/09/08 21:35:00 UTC

nutch pull request: redirects treated as external links

GitHub user driki opened a pull request:

    https://github.com/apache/nutch/pull/1

    redirects treated as external links

    Hi,
    
    I encountered an issue with the crawler adhering to the db.ignore.external.links property when encountering a link on the same domain that contains a redirect to an external domain. Tested locally against a few sites that I crawl and appears to be working.
    
    Thanks,
    Matt

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NearbyFYI/nutch 2.x

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nutch/pull/1.patch

----
commit a71133d257aa5c7835b9cc98134b1e7b3df5b5fe
Author: Matt MacDonald <ma...@gmail.com>
Date:   2012-09-08T12:25:28-07:00

    redirects treated as external links

----