You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Frank Kempf <fl...@2112portals.com> on 2006/09/20 18:38:59 UTC
Cannot generate all injected URLS
Hello,
got stuck with generating.
Injecting 3200 Urls into the database and generating afterwards leads always to
the same result of having 1632 Urls in crawl_generate.
(I checked the db and it actually has 3200 entries).
No matter if I try -topN 5000 / 50000 or nothing.
How could I generate a whole set of first level Urls?
Kind regards
Frank
Re: Cannot generate all injected URLS
Posted by Dennis Kubes <nu...@dragonflymc.com>.
What was the problem?
Dennis
Frank Kempf wrote:
> solved
>
> THX
Re: Cannot generate all injected URLS
Posted by Frank Kempf <fl...@2112portals.com>.
solved
THX
Re: Cannot generate all injected URLS
Posted by Sami Siren <ss...@gmail.com>.
Are you running in non clustered mode, then run with parameter
-numFetchers 1 and you should get all the urls.
perhaps we should fix this by adding a check in generator:
if task is run with local job runner that param should be forced to 1
(now it defaults to job.getNumMapTasks() which defaults to 2)
--
Sami Siren
Frank Kempf wrote:
> Hello,
>
> got stuck with generating.
> Injecting 3200 Urls into the database and generating afterwards leads
> always to the same result of having 1632 Urls in crawl_generate.
> (I checked the db and it actually has 3200 entries).
> No matter if I try -topN 5000 / 50000 or nothing.
> How could I generate a whole set of first level Urls?
>
>
> Kind regards
>
> Frank
>
>