You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Frank Kempf <fl...@2112portals.com> on 2006/09/20 18:38:59 UTC

Cannot generate all injected URLS

Hello,

got stuck with generating.
Injecting 3200 Urls into the database and generating afterwards leads always to 
the same result of having 1632 Urls in crawl_generate.
(I checked the db and it actually has 3200 entries).
No matter if I try -topN 5000 / 50000 or nothing.
How could I generate a whole set of first level Urls?


   Kind regards

     Frank


Re: Cannot generate all injected URLS

Posted by Dennis Kubes <nu...@dragonflymc.com>.
What was the problem?

Dennis

Frank Kempf wrote:
> solved
>
> THX

Re: Cannot generate all injected URLS

Posted by Frank Kempf <fl...@2112portals.com>.
solved

THX

Re: Cannot generate all injected URLS

Posted by Sami Siren <ss...@gmail.com>.
Are you running in non clustered mode, then run with parameter 
-numFetchers 1 and you should get all the urls.

perhaps we should fix this by adding a check in generator:

if task is run with local job runner that param should be forced to 1 
(now it defaults to job.getNumMapTasks() which defaults to 2)

--
  Sami Siren

Frank Kempf wrote:
> Hello,
> 
> got stuck with generating.
> Injecting 3200 Urls into the database and generating afterwards leads 
> always to the same result of having 1632 Urls in crawl_generate.
> (I checked the db and it actually has 3200 entries).
> No matter if I try -topN 5000 / 50000 or nothing.
> How could I generate a whole set of first level Urls?
> 
> 
>   Kind regards
> 
>     Frank
> 
>