You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "simpleliving016@gmail.com" <si...@gmail.com> on 2014/03/13 05:33:57 UTC

Where is the crawl directory stored.

Hello

While running Nutch over Hadoop we pass in the crawl directory with the dir parameter , which I assume stores the segment and other data structures. 

However I could not find this directory in the hadoop tmp directory which is /tmp/hadoop-username. Could some one let me know where it is stored?Or is there some renaming going on here?

Thanks.

Sent from my HTC


Re: Where is the crawl directory stored.

Posted by "S.L" <si...@gmail.com>.
I am sorry I am not  sure I understand that Markus,  I am submitting Nutch
using the following.

/opt/hadoop-2.3.0/bin/hadoop jar
/opt/dfconfig/nutch/apache-nutch-1.8-SNAPSHOT.job
org.apache.nutch.crawl.Crawl /urls -dir crawldirectory123 -depth 1000 topN
30000

however I dont see the crawldirectory123 being stored any where . I looked
it under the /tmp/hadoop-user folder but no luck . Any idea where this is
stored. Is is part of the namenode or datanode in YARN ?


On Thu, Mar 13, 2014 at 4:48 AM, Markus Jelsma
<ma...@openindex.io>wrote:

> Well, there is some crawl/ dir somewhere, is it not? Segments are in there.
>
> -----Original message-----
> > From:simpleliving016@gmail.com <si...@gmail.com>
> > Sent: Thursday 13th March 2014 5:34
> > To: user@nutch.apache.org
> > Subject: Where is the crawl directory stored.
> >
> > Hello
> >
> > While running Nutch over Hadoop we pass in the crawl directory with the
> dir parameter , which I assume stores the segment and other data structures.
> >
> > However I could not find this directory in the hadoop tmp directory
> which is /tmp/hadoop-username. Could some one let me know where it is
> stored?Or is there some renaming going on here?
> >
> > Thanks.
> >
> > Sent from my HTC
> >
> >
>

RE: Where is the crawl directory stored.

Posted by Markus Jelsma <ma...@openindex.io>.
Well, there is some crawl/ dir somewhere, is it not? Segments are in there.
 
-----Original message-----
> From:simpleliving016@gmail.com <si...@gmail.com>
> Sent: Thursday 13th March 2014 5:34
> To: user@nutch.apache.org
> Subject: Where is the crawl directory stored.
> 
> Hello
> 
> While running Nutch over Hadoop we pass in the crawl directory with the dir parameter , which I assume stores the segment and other data structures. 
> 
> However I could not find this directory in the hadoop tmp directory which is /tmp/hadoop-username. Could some one let me know where it is stored?Or is there some renaming going on here?
> 
> Thanks.
> 
> Sent from my HTC
> 
>