You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2012/05/03 08:32:16 UTC

Re: Generator OOM

 FYI, i checked to code and it's indeed the host or domain limit that's 
 responsible for the OOM. The score is ok as it's not being accumulated 
 anyway. Anyone can work around the problem by either increasing the heap 
 space allocated to the reducers or, significantly increase the number of 
 reducers or, slightly increase the host or domain limit value.

 On Thu, 26 Apr 2012 21:02:58 +0200, Markus Jelsma 
 <ma...@openindex.io> wrote:
> Hi,
>
> We sometimes see the generator running OOM. This happens because we
> either have a too high topN value or too many segments to generate. 
> In
> any case, a very large amount of records is being generated with the
> same (lowest) score and end up in a single reducer. We limit the
> generator by domain which may be a source of trouble.
>
> I've not yet found a way around this problem so i'm looking for 
> suggestions.
>
> Thanks,
> Markus