You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "simpleliving016@gmail.com" <si...@gmail.com> on 2014/03/13 05:33:57 UTC
Where is the crawl directory stored.
Hello
While running Nutch over Hadoop we pass in the crawl directory with the dir parameter , which I assume stores the segment and other data structures.
However I could not find this directory in the hadoop tmp directory which is /tmp/hadoop-username. Could some one let me know where it is stored?Or is there some renaming going on here?
Thanks.
Sent from my HTC
Re: Where is the crawl directory stored.
Posted by "S.L" <si...@gmail.com>.
I am sorry I am not sure I understand that Markus, I am submitting Nutch
using the following.
/opt/hadoop-2.3.0/bin/hadoop jar
/opt/dfconfig/nutch/apache-nutch-1.8-SNAPSHOT.job
org.apache.nutch.crawl.Crawl /urls -dir crawldirectory123 -depth 1000 topN
30000
however I dont see the crawldirectory123 being stored any where . I looked
it under the /tmp/hadoop-user folder but no luck . Any idea where this is
stored. Is is part of the namenode or datanode in YARN ?
On Thu, Mar 13, 2014 at 4:48 AM, Markus Jelsma
<ma...@openindex.io>wrote:
> Well, there is some crawl/ dir somewhere, is it not? Segments are in there.
>
> -----Original message-----
> > From:simpleliving016@gmail.com <si...@gmail.com>
> > Sent: Thursday 13th March 2014 5:34
> > To: user@nutch.apache.org
> > Subject: Where is the crawl directory stored.
> >
> > Hello
> >
> > While running Nutch over Hadoop we pass in the crawl directory with the
> dir parameter , which I assume stores the segment and other data structures.
> >
> > However I could not find this directory in the hadoop tmp directory
> which is /tmp/hadoop-username. Could some one let me know where it is
> stored?Or is there some renaming going on here?
> >
> > Thanks.
> >
> > Sent from my HTC
> >
> >
>
RE: Where is the crawl directory stored.
Posted by Markus Jelsma <ma...@openindex.io>.
Well, there is some crawl/ dir somewhere, is it not? Segments are in there.
-----Original message-----
> From:simpleliving016@gmail.com <si...@gmail.com>
> Sent: Thursday 13th March 2014 5:34
> To: user@nutch.apache.org
> Subject: Where is the crawl directory stored.
>
> Hello
>
> While running Nutch over Hadoop we pass in the crawl directory with the dir parameter , which I assume stores the segment and other data structures.
>
> However I could not find this directory in the hadoop tmp directory which is /tmp/hadoop-username. Could some one let me know where it is stored?Or is there some renaming going on here?
>
> Thanks.
>
> Sent from my HTC
>
>