You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2015/06/02 18:56:58 UTC

Nutch errors on VirtualBox shared folders

Hi Folks,
I wanted to post to this list some observations and findings we've
experienced regarding the above topic and how Nutch is behaving. [0]

Essentially, this comes down to the following "By default, Vagrant maps the
'source' directory on the host machine to /vagrant on the client. This is
handy, particular when you want to make local source changes and see how it
affects the deployed machine.
This can break in situations where the program is running in the local
source directory, or when operations on the source directory are sensitive
to the file system type."

The team at Continuum Analytics overcame this issue by running the crawls
now in /home/vagrant, which is *not* mapped, see [1]

The issue is fixed for us, but it's exposed an underlying issue in the way
Nutch interacts with a "hostile" file system, and the Nutch developers
might want to take a look at this to harden the crawler against similar
issues in the future.

[0] https://github.com/memex-explorer/memex-explorer/issues/558
[1] https://github.com/memex-explorer/memex-explorer/pull/557

-- 
*Lewis*