You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Paul Dhaliwal <su...@gmail.com> on 2007/01/09 18:21:19 UTC

LocalFileSystem , LinkDbReader and workingDir

Hello,

There is an issue with the way LocalFileSystem.pathToFile(Path path)
function works.

It uses the workingDir member. It is computed from System.getProperty("
user.home"); when the constructor is called. It can  be set, but
LinkDbReader does not set it.

This leads to a file not found exception when crawl folder is not in user's
home.

The way I understand it, crawl directory is "pretty much" the working
directory for nutch segments, crawls, and everything else. Should crawl
directory be set as the working directory for the file system?

I am not using LinkDbReader, but I am using the code from LinkDbReader and I
am not calling the LinkDbReader main.

I can work around the issue, but would appreciate some direction on how to
go about it.

Thanks in Advance,
Paul Dhaliwal

Re: LocalFileSystem , LinkDbReader and workingDir

Posted by Paul Dhaliwal <su...@gmail.com>.

Thank you.

On 1/9/07, Andrzej Bialecki <ab...@getopt.org> wrote:
>
> Paul Dhaliwal wrote:
> > Hello,
> >
> > There is an issue with the way LocalFileSystem.pathToFile(Path path)
> > function works.
> >
> > It uses the workingDir member. It is computed from System.getProperty("
> > user.home"); when the constructor is called. It can  be set, but
> > LinkDbReader does not set it.
> >
> > This leads to a file not found exception when crawl folder is not in
> > user's
> > home.
> >
> > The way I understand it, crawl directory is "pretty much" the working
> > directory for nutch segments, crawls, and everything else. Should crawl
> > directory be set as the working directory for the file system?
>
> This is purely a convention - in fact, I've seen in many cases DBs and
> segments and indexes put in completely different places, for operation
> reasons. user.home is at least predictable ...


Makes sense.

>
> > I am not using LinkDbReader, but I am using the code from LinkDbReader
> > and I
> > am not calling the LinkDbReader main.
> >
> > I can work around the issue, but would appreciate some direction on
> > how to
> > go about it.
>
> Since you are using LinkDbReader in a non-standard way I think it's best
> if you implement a workaround for this.


Its not so "non-standard". Just using the code from LinkDbReader. It might
save someone some time in the future to set the working directory to the
crawl directory, or have a note that if you plan to use this code as your
base, you might need to set the working directory  in LocalFilesystem to the
crawl directory etc..

BTW. when using both local FS and Hadoop DFS I've long ago adopted a
> practice to use absolute paths for any arguments ... ;)


Thanks for the tip.

--
> Best regards,
> Andrzej Bialecki     <><
> ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
>
Paul

Re: LocalFileSystem , LinkDbReader and workingDir

Posted by Andrzej Bialecki <ab...@getopt.org>.

Paul Dhaliwal wrote:
> Hello,
>
> There is an issue with the way LocalFileSystem.pathToFile(Path path)
> function works.
>
> It uses the workingDir member. It is computed from System.getProperty("
> user.home"); when the constructor is called. It can  be set, but
> LinkDbReader does not set it.
>
> This leads to a file not found exception when crawl folder is not in 
> user's
> home.
>
> The way I understand it, crawl directory is "pretty much" the working
> directory for nutch segments, crawls, and everything else. Should crawl
> directory be set as the working directory for the file system?

This is purely a convention - in fact, I've seen in many cases DBs and 
segments and indexes put in completely different places, for operation 
reasons. user.home is at least predictable ...

>
> I am not using LinkDbReader, but I am using the code from LinkDbReader 
> and I
> am not calling the LinkDbReader main.
>
> I can work around the issue, but would appreciate some direction on 
> how to
> go about it.

Since you are using LinkDbReader in a non-standard way I think it's best 
if you implement a workaround for this.

BTW. when using both local FS and Hadoop DFS I've long ago adopted a 
practice to use absolute paths for any arguments ... ;)

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com