You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Miller <ma...@gmail.com> on 2009/07/14 15:22:46 UTC

Re: TooManyOpenFiles: indexing in one core, doing many searches at the same time in another

What merge factor are you using now? The merge factor will influence the
number of files that are created as the index grows. Lower = fewer file
descriptors needed, but also slower bulk indexing.
You could up the Max Open Files settings on your OS.

You could also use
    <!-- options specific to the main on-disk lucene index -->
    <useCompoundFile>true</useCompoundFile>

Which writes multiple segments to one file and requires *way* less file
handles (slightly slower indexing).

It would normally be odd to hit something like that after only 50,000
documents, but a doc with 300 fields is certainly not the norm ;) Anything
else special about your setup?

-- 
- Mark

http://www.lucidimagination.com

On Tue, Jul 14, 2009 at 12:49 PM, Bruno Aranda <br...@gmail.com>wrote:

> Hi,
>
> We are having a TooManyOpenFiles exception in our indexing process. We
> are reading data from a database and indexing this data into one of
> the two cores of our solr instance. Each of the cores has a different
> schema as they are used for a different purpose. While we index in the
> first core, we do many searches in the second core as it contains data
> to "enrich" what we index (the second core is never modifier - read
> only). After indexing about 50.000 documents (about 300 fields each)
> we get the exception. If we run the same process, but without the
> "enrichment" (not doing queries in the second core), everything goes
> all right.
> We are using spring batch, and we only commit+optimize at the very
> end, as we don't need to search anything in the data that is being
> indexed.
>
> I have seen recommendations that go from committing+optimize more
> often or lowering the merge factor? How is the merge factor affecting
> in this scenario?
>
> Thanks,
>
> Bruno
>

Re: TooManyOpenFiles: indexing in one core, doing many searches at the same time in another

Posted by Bruno Aranda <br...@gmail.com>.
Hi, my process is:

I index 600000 docs in the secondary core (each doc has 5 fields). No
problem with that. After this core is indexed (and optimized) it will
be used only for searches, during the main core indexing.
Currently, I am using mergeFactoror 10 for the main core. I will try
with 2 to see if it changes and the useCompoundFile set to true. I
guess I don't need to modify anything in the secondary core as it is
only used for searches.

Thanks for your answers,

Bruno

2009/7/14 Mark Miller <ma...@gmail.com>:
> What merge factor are you using now? The merge factor will influence the
> number of files that are created as the index grows. Lower = fewer file
> descriptors needed, but also slower bulk indexing.
> You could up the Max Open Files settings on your OS.
>
> You could also use
>    <!-- options specific to the main on-disk lucene index -->
>    <useCompoundFile>true</useCompoundFile>
>
> Which writes multiple segments to one file and requires *way* less file
> handles (slightly slower indexing).
>
> It would normally be odd to hit something like that after only 50,000
> documents, but a doc with 300 fields is certainly not the norm ;) Anything
> else special about your setup?
>
> --
> - Mark
>
> http://www.lucidimagination.com
>
> On Tue, Jul 14, 2009 at 12:49 PM, Bruno Aranda <br...@gmail.com>wrote:
>
>> Hi,
>>
>> We are having a TooManyOpenFiles exception in our indexing process. We
>> are reading data from a database and indexing this data into one of
>> the two cores of our solr instance. Each of the cores has a different
>> schema as they are used for a different purpose. While we index in the
>> first core, we do many searches in the second core as it contains data
>> to "enrich" what we index (the second core is never modifier - read
>> only). After indexing about 50.000 documents (about 300 fields each)
>> we get the exception. If we run the same process, but without the
>> "enrichment" (not doing queries in the second core), everything goes
>> all right.
>> We are using spring batch, and we only commit+optimize at the very
>> end, as we don't need to search anything in the data that is being
>> indexed.
>>
>> I have seen recommendations that go from committing+optimize more
>> often or lowering the merge factor? How is the merge factor affecting
>> in this scenario?
>>
>> Thanks,
>>
>> Bruno
>>
>