You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Tom Landvoigt <to...@linklift.de> on 2010/02/01 14:58:25 UTC

Generate of Segments

Hi, 

 

I am using Nutch-1.0 manly for crawling. 

 

I want to generate Segments with a fixed size eg. 1000 urls. But the
Segment should only contain uncrawled urls and urls which have been
waiting longest for recrawling.

 

Can anyone give me a hint where I should tackle the problem?

 

Thanks a lot

 

Tom


Re: Generate of Segments

Posted by xiao yang <ya...@gmail.com>.
bin/nutch generate crawl/crawldb crawl/segments -topN 1000


On Mon, Feb 1, 2010 at 9:58 PM, Tom Landvoigt <to...@linklift.de> wrote:
> Hi,
>
>
>
> I am using Nutch-1.0 manly for crawling.
>
>
>
> I want to generate Segments with a fixed size eg. 1000 urls. But the
> Segment should only contain uncrawled urls and urls which have been
> waiting longest for recrawling.
>
>
>
> Can anyone give me a hint where I should tackle the problem?
>
>
>
> Thanks a lot
>
>
>
> Tom
>
>