You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Piotr Kosiorowski <pk...@gmail.com> on 2005/04/27 22:25:52 UTC

Re: Nutch Distributed File System

Hello,

I updated documentation on NDFS on Wiki, fixed some minor issues with
NDFS command line handling (thanks Doug for commiting it already) and
submitted a patch to JIRA for NDFS Windows issues.

During my tests I found a small bug that was present also in previous
version of the code - using TestClient when one tries to copy NDFS
file to another NDFS file (with filename only - so "abc" but not
"/abc" or "def/abc" ) TestClient thows NullPointerException. The
problem is located in  FileUtil.copyContents(). The code takes parent
of the File object (in this case null) and checks if it exists using
NutchFileSystem.exists() method that is not handling nulls in any
special way.
In my opinion before invoking NutchFileSystem.exists() it should check
if parent is not null. I will send a patch for it soon.

Regards,
Piotr
I

On 4/20/05, Piotr Kosiorowski <pk...@gmail.com> wrote:
> Hello,
> 
> I was trying to setup NDFS on my notebook today and had some problems:
> 1) Documentation on
> http://wiki.apache.org/nutch/NutchDistributedFileSystem is a good start
> but it is a bit outdated (examples use old package names, properties in
> config files are not mentioned and some tools take different number or
> format of command line parameters). I can update it a bit if noone
> objects - but I want to make sure it would be enough to simply edit wiki
> page content. So is any other activity required to have it updated?
> 
> 2) NUTCH-46 - on Windows platform there are problems related to handling
> of file separators - nutch uses java.io.File object and sometimes
> creates paths by appending strings with Unix path convention. I will try
> to make it work on both platforms (I am using Windows for development as
> a company standard development machine so it is a problem for me). I
> will post a patch for it or ask further questions if required changes
> would not be trivial.
> 
> 3) There are some minor issues with NDFS for a beginner I would like to
> provide a patch for.
> Eg.
>  a) not all command line options are displayed when printing usage
> information for TestClient class
>  b) TestClient is not always checking if execution of the command failed .
>  c) It is not easy to start two instances of DataNode on one machine
> (for tests) - I would like to add command line options for starting data
> node in given directory on given port (they would have precedence over
> config entries).
> 
> I am sure I will have more comments as I will make progress in
> discovering NDFS secrets :).
> So I will work on it in next few days and I will provide a patches for
> review.
> Regards,
> Piotr
>