You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Stefan Groschupf <sg...@media-style.com> on 2006/03/01 23:17:37 UTC

scalability limits getDetails, mapFile Readers?

Hi,

We run into a problem with nutch using  
MapFileOutputFormat#getReaders  and getEntry.
In detail this happens until summary generation where we open for  
each segment as much readers as much parts (part-0000 to part-n) we  
have.
Having 80 tasktracker and 80 segments means:
80 x 80 x 4 (parseData, parseText, content, crawl). A search server   
also needs to open as much files as required for the index searcher.
So the problem is a FileNotFoundException, (Too many open files).

Opening and closing Readers for each Detail makes no sense. We may  
can limit the number of readers somehow and close the readers that  
wasn't used since the longest time.
But I'm not that happy with this solution, so any thoughts how we can  
solve this problem in general?

Thanks.
Stefan


---------------------------------------------
blog: http://www.find23.org
company: http://www.media-style.com

Re: scalability limits getDetails, mapFile Readers?

Posted by Stefan Groschupf <sg...@media-style.com>.

ups, sorry my mistake. I will post it in nutch- dev again.

Am 02.03.2006 um 00:25 schrieb Doug Cutting:

> Stefan,
>
> I think you meant to send this to nutch-dev, not hadoop-dev.
>
> Doug
>
> Stefan Groschupf wrote:
>> Hi,
>> We run into a problem with nutch using   
>> MapFileOutputFormat#getReaders  and getEntry.
>> In detail this happens until summary generation where we open for   
>> each segment as much readers as much parts (part-0000 to part-n)  
>> we  have.
>> Having 80 tasktracker and 80 segments means:
>> 80 x 80 x 4 (parseData, parseText, content, crawl). A search  
>> server   also needs to open as much files as required for the  
>> index searcher.
>> So the problem is a FileNotFoundException, (Too many open files).
>> Opening and closing Readers for each Detail makes no sense. We  
>> may  can limit the number of readers somehow and close the readers  
>> that  wasn't used since the longest time.
>> But I'm not that happy with this solution, so any thoughts how we  
>> can  solve this problem in general?
>> Thanks.
>> Stefan
>> ---------------------------------------------
>> blog: http://www.find23.org
>> company: http://www.media-style.com
>

---------------------------------------------
blog: http://www.find23.org
company: http://www.media-style.com

Re: scalability limits getDetails, mapFile Readers?

Posted by Doug Cutting <cu...@apache.org>.

Stefan,

I think you meant to send this to nutch-dev, not hadoop-dev.

Doug

Stefan Groschupf wrote:
> Hi,
> 
> We run into a problem with nutch using  MapFileOutputFormat#getReaders  
> and getEntry.
> In detail this happens until summary generation where we open for  each 
> segment as much readers as much parts (part-0000 to part-n) we  have.
> Having 80 tasktracker and 80 segments means:
> 80 x 80 x 4 (parseData, parseText, content, crawl). A search server   
> also needs to open as much files as required for the index searcher.
> So the problem is a FileNotFoundException, (Too many open files).
> 
> Opening and closing Readers for each Detail makes no sense. We may  can 
> limit the number of readers somehow and close the readers that  wasn't 
> used since the longest time.
> But I'm not that happy with this solution, so any thoughts how we can  
> solve this problem in general?
> 
> Thanks.
> Stefan
> 
> 
> ---------------------------------------------
> blog: http://www.find23.org
> company: http://www.media-style.com
> 
>