You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Cam Bazz <ca...@gmail.com> on 2011/07/07 17:18:42 UTC
inject will not take all the urls
Hello,
I am trying to inject a set of urls, in range of 800K. however it
seems that only half of them are injected to crawldb? (I am checking
with -stats option)
I wonder why?
Best Regards,
-C.B.
Re: inject will not take all the urls
Posted by Markus Jelsma <ma...@openindex.io>.
Check your URL filters. This is the most common pitfall with injection. Most
likely a fair amount of URLs are removed by the filters.
On Thursday 07 July 2011 17:18:42 Cam Bazz wrote:
> Hello,
>
> I am trying to inject a set of urls, in range of 800K. however it
> seems that only half of them are injected to crawldb? (I am checking
> with -stats option)
>
> I wonder why?
>
> Best Regards,
> -C.B.
--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350