You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2006/09/11 02:10:33 UTC

[Nutch Wiki] Update of "FAQ" by JimboJw

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by JimboJw:
http://wiki.apache.org/nutch/FAQ

------------------------------------------------------------------------------
      </property>
  
  Now you can invoke the crawler and index all or part of your disk. The only remaining gotcha is that if you use Mozilla it will '''not''' load file: URLs from a web paged fetched with http, so if you test with the Nutch web container running in Tomcat, annoyingly, as you click on results nothing will happen as Mozilla by default does not load file URLs. This is mentioned [http://www.mozilla.org/quality/networking/testing/filetests.html here] and this behavior may be disabled by a [http://www.mozilla.org/quality/networking/docs/netprefs.html preference] (see security.checkloaduri). IE5 does not have this problem.
+ 
+ ==== How do I index remote file shares? ====
+ 
+ At the current time, Nutch does not have built in support for accessing files over SMB (Windows) shares.  This means the only available method is to mount the shares yourself, then index the contents as though they were local directories (see above).
+ 
+ Note that the share mounting method suffers from the following drawbacks:
+ # The links generated by Nutch will not work except for queries from localhost (end users typically won't have the exact same shares mounted in the exact same way).
+ # You are limited to the number of mounted shares your operating system supports.  In *nix environments, this is effectively unlimited, but in Windows you may mount 26 (one share or drive per letter in the English alphabet)
+ # Documents with links to shares are unlikely to work since they won't link to the share on your machine, but rather to the SMB version.
  
  ==== While indexing documents, I get the following error: ====