You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Amna Waqar <am...@gmail.com> on 2011/02/02 12:46:35 UTC

help with readseg

I am using the following command
[root@Amna search]# bin/nutch readseg -dump
/user/root/crawl/segments/20110124205537/ amna_out

but output is
SegmentReader: dump segment: /user/root/crawl/segments/20110124205537
SegmentReader: done

how can i view the readseg output from amna_out temporary directory because
i didnt see this directory anywhere
After reading the code of segment reader ,in dump method,there is no println
method or flush method being called so how can we see its output

I can see the output of
readseg -get command which is for one url

Re: help with readseg

Posted by Amna Waqar <am...@gmail.com>.
I have found that directory..That in hdfs path
/user/root/crawl/

On Wed, Feb 2, 2011 at 7:19 AM, Amna Waqar <am...@gmail.com> wrote:

> i donot see any directory named amna_out in the current working directory
> from where the command is launched .Why this is so? I m using nutch.1.2
>
>
>
> On Wed, Feb 2, 2011 at 7:01 AM, Arjun Kumar Reddy <
> charjunkumar.reddy@iiitb.net> wrote:
>
>> Hi Amna,
>>
>> The output folder 'amna_out' will be created in the directory from where
>> you
>> are running the readseg command. In that directory, you'll find a file
>> named
>> 'dump'. You can get the contents of crawled pages from it.
>>
>> Thanks and regards,*
>> *Ch. Arjun Kumar Reddy
>>
>>
>> On Wed, Feb 2, 2011 at 5:16 PM, Amna Waqar <am...@gmail.com>
>> wrote:
>>
>> > I am using the following command
>> > [root@Amna search]# bin/nutch readseg -dump
>> > /user/root/crawl/segments/20110124205537/ amna_out
>> >
>> > but output is
>> > SegmentReader: dump segment: /user/root/crawl/segments/20110124205537
>> > SegmentReader: done
>> >
>> > how can i view the readseg output from amna_out temporary directory
>> because
>> > i didnt see this directory anywhere
>> > After reading the code of segment reader ,in dump method,there is no
>> > println
>> > method or flush method being called so how can we see its output
>> >
>> > I can see the output of
>> > readseg -get command which is for one url
>> >
>>
>
>

Re: help with readseg

Posted by Amna Waqar <am...@gmail.com>.
i donot see any directory named amna_out in the current working directory
from where the command is launched .Why this is so? I m using nutch.1.2


On Wed, Feb 2, 2011 at 7:01 AM, Arjun Kumar Reddy <
charjunkumar.reddy@iiitb.net> wrote:

> Hi Amna,
>
> The output folder 'amna_out' will be created in the directory from where
> you
> are running the readseg command. In that directory, you'll find a file
> named
> 'dump'. You can get the contents of crawled pages from it.
>
> Thanks and regards,*
> *Ch. Arjun Kumar Reddy
>
>
> On Wed, Feb 2, 2011 at 5:16 PM, Amna Waqar <am...@gmail.com>
> wrote:
>
> > I am using the following command
> > [root@Amna search]# bin/nutch readseg -dump
> > /user/root/crawl/segments/20110124205537/ amna_out
> >
> > but output is
> > SegmentReader: dump segment: /user/root/crawl/segments/20110124205537
> > SegmentReader: done
> >
> > how can i view the readseg output from amna_out temporary directory
> because
> > i didnt see this directory anywhere
> > After reading the code of segment reader ,in dump method,there is no
> > println
> > method or flush method being called so how can we see its output
> >
> > I can see the output of
> > readseg -get command which is for one url
> >
>

Re: help with readseg

Posted by Arjun Kumar Reddy <ch...@iiitb.net>.
Hi Amna,

The output folder 'amna_out' will be created in the directory from where you
are running the readseg command. In that directory, you'll find a file named
'dump'. You can get the contents of crawled pages from it.

Thanks and regards,*
*Ch. Arjun Kumar Reddy


On Wed, Feb 2, 2011 at 5:16 PM, Amna Waqar <am...@gmail.com> wrote:

> I am using the following command
> [root@Amna search]# bin/nutch readseg -dump
> /user/root/crawl/segments/20110124205537/ amna_out
>
> but output is
> SegmentReader: dump segment: /user/root/crawl/segments/20110124205537
> SegmentReader: done
>
> how can i view the readseg output from amna_out temporary directory because
> i didnt see this directory anywhere
> After reading the code of segment reader ,in dump method,there is no
> println
> method or flush method being called so how can we see its output
>
> I can see the output of
> readseg -get command which is for one url
>