You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2014/09/26 00:01:22 UTC

[Nutch Wiki] Update of "SetupNutchAndTor" by LewisJohnMcgibbney

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "SetupNutchAndTor" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/SetupNutchAndTor?action=diff&rev1=4&rev2=5

  <<TableOfContents(4)>>
  
  == Important Note ==
- The aim of this tutorial is to explain *crawling of* hidden services... not for us to use hidden services to crawl. This is a critical point which should both be taken into consideration when reading and using Nutch to crawl the Tor network.
+ The aim of this tutorial is to explain '''crawling of''' hidden services... not for us to use hidden services to crawl. This is a critical point which should both be taken into consideration when reading and using Nutch to crawl the Tor network. Crawling normal websites via Tor can overload the Tor network, but more importantly you can end up making those websites block connections from Tor, thus preventing normal users from being able to reach or use that website.
- If you are looking to use Nutch to crawl the web from behind the Tor network, then you are in the wrong place.
+ '''If you are looking to use Nutch to crawl the web from behind the Tor network, then you are in the wrong place.'''
  
  == Introduction ==
  [[https://www.torproject.org/|Tor]] is a network of virtual tunnels that allows people and groups to improve their privacy and security on the Internet. It also enables software developers to create new communication tools with built-in privacy features. Tor provides the foundation for a range of applications that allow organizations and individuals to share information over public networks without compromising their privacy.