You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Paul Dhaliwal <su...@gmail.com> on 2007/01/09 18:21:19 UTC
LocalFileSystem , LinkDbReader and workingDir
Hello,
There is an issue with the way LocalFileSystem.pathToFile(Path path)
function works.
It uses the workingDir member. It is computed from System.getProperty("
user.home"); when the constructor is called. It can be set, but
LinkDbReader does not set it.
This leads to a file not found exception when crawl folder is not in user's
home.
The way I understand it, crawl directory is "pretty much" the working
directory for nutch segments, crawls, and everything else. Should crawl
directory be set as the working directory for the file system?
I am not using LinkDbReader, but I am using the code from LinkDbReader and I
am not calling the LinkDbReader main.
I can work around the issue, but would appreciate some direction on how to
go about it.
Thanks in Advance,
Paul Dhaliwal
Re: LocalFileSystem , LinkDbReader and workingDir
Posted by Paul Dhaliwal <su...@gmail.com>.
Thank you.
On 1/9/07, Andrzej Bialecki <ab...@getopt.org> wrote:
>
> Paul Dhaliwal wrote:
> > Hello,
> >
> > There is an issue with the way LocalFileSystem.pathToFile(Path path)
> > function works.
> >
> > It uses the workingDir member. It is computed from System.getProperty("
> > user.home"); when the constructor is called. It can be set, but
> > LinkDbReader does not set it.
> >
> > This leads to a file not found exception when crawl folder is not in
> > user's
> > home.
> >
> > The way I understand it, crawl directory is "pretty much" the working
> > directory for nutch segments, crawls, and everything else. Should crawl
> > directory be set as the working directory for the file system?
>
> This is purely a convention - in fact, I've seen in many cases DBs and
> segments and indexes put in completely different places, for operation
> reasons. user.home is at least predictable ...
Makes sense.
>
> > I am not using LinkDbReader, but I am using the code from LinkDbReader
> > and I
> > am not calling the LinkDbReader main.
> >
> > I can work around the issue, but would appreciate some direction on
> > how to
> > go about it.
>
> Since you are using LinkDbReader in a non-standard way I think it's best
> if you implement a workaround for this.
Its not so "non-standard". Just using the code from LinkDbReader. It might
save someone some time in the future to set the working directory to the
crawl directory, or have a note that if you plan to use this code as your
base, you might need to set the working directory in LocalFilesystem to the
crawl directory etc..
BTW. when using both local FS and Hadoop DFS I've long ago adopted a
> practice to use absolute paths for any arguments ... ;)
Thanks for the tip.
--
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>
>
>
Paul
Re: LocalFileSystem , LinkDbReader and workingDir
Posted by Andrzej Bialecki <ab...@getopt.org>.
Paul Dhaliwal wrote:
> Hello,
>
> There is an issue with the way LocalFileSystem.pathToFile(Path path)
> function works.
>
> It uses the workingDir member. It is computed from System.getProperty("
> user.home"); when the constructor is called. It can be set, but
> LinkDbReader does not set it.
>
> This leads to a file not found exception when crawl folder is not in
> user's
> home.
>
> The way I understand it, crawl directory is "pretty much" the working
> directory for nutch segments, crawls, and everything else. Should crawl
> directory be set as the working directory for the file system?
This is purely a convention - in fact, I've seen in many cases DBs and
segments and indexes put in completely different places, for operation
reasons. user.home is at least predictable ...
>
> I am not using LinkDbReader, but I am using the code from LinkDbReader
> and I
> am not calling the LinkDbReader main.
>
> I can work around the issue, but would appreciate some direction on
> how to
> go about it.
Since you are using LinkDbReader in a non-standard way I think it's best
if you implement a workaround for this.
BTW. when using both local FS and Hadoop DFS I've long ago adopted a
practice to use absolute paths for any arguments ... ;)
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com