You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Phillip Farber <pf...@umich.edu> on 2009/10/08 20:27:51 UTC

Optimization of large shard succeeded

I thought I'd summarize a method that solved the problem we were having 
trying to optimize a large shard that was running out of disk space, 
df=100% (400g), du=~380g.  After we ran out of space, if we restarted 
tomcat, segment files disappeared from disk leaving 3 segments.

What worked: we used the <optimize maxSegments=... functionality to 
optimize in maxSegments stages of powers of 2: 16, 8, 4, 2, 1. We did 
not see the merged segment files from previous generations left on disk. 
  The staged optimize was as fast as optimizing once to a single segment 
which was the case which ran out of space.

We were not adding documents to the index. We committed before doing the 
staged optimize. We do not delete documents. We do not use 
replication/distribution/snapshooter. We do not autocommit.

400g LVM volume, 192g/30 segment shard, optimized: 188g

solrconfig:

<useCompoundFile>false</useCompoundFile>
<mergeFactor>10</mergeFactor>
<ramBufferSizeMB>32</ramBufferSizeMB>
<maxMergeDocs>2147483647</maxMergeDocs>
<maxFieldLength>10000000</maxFieldLength>
<unlockOnStartup>false</unlockOnStartup>
<reopenReaders>true</reopenReaders>
<deletionPolicy class="solr.SolrDeletionPolicy">
    <str name="keepOptimizedOnly">false</str>
    <str name="maxCommitsToKeep">1</str>

schema:

<field name="id" type="string" indexed="true" stored="true" 
required="true"/>
<field name="ocr" type="CommonGramTest" indexed="true" stored="false" 
required="true"/>
<field name="title" type="string" indexed="true" stored="true" 
multiValued="true" required="true"/>
<field name="rights" type="sint" indexed="true" stored="true" 
required="true"/>
<field name="author" type="string" indexed="true" stored="true" 
multiValued="true"/>
<field name="date" type="string" indexed="true" stored="true"/>


Phil
hathitrust.org