You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Phillip Farber <pf...@umich.edu> on 2009/10/08 20:27:51 UTC
Optimization of large shard succeeded
I thought I'd summarize a method that solved the problem we were having
trying to optimize a large shard that was running out of disk space,
df=100% (400g), du=~380g. After we ran out of space, if we restarted
tomcat, segment files disappeared from disk leaving 3 segments.
What worked: we used the <optimize maxSegments=... functionality to
optimize in maxSegments stages of powers of 2: 16, 8, 4, 2, 1. We did
not see the merged segment files from previous generations left on disk.
The staged optimize was as fast as optimizing once to a single segment
which was the case which ran out of space.
We were not adding documents to the index. We committed before doing the
staged optimize. We do not delete documents. We do not use
replication/distribution/snapshooter. We do not autocommit.
400g LVM volume, 192g/30 segment shard, optimized: 188g
solrconfig:
<useCompoundFile>false</useCompoundFile>
<mergeFactor>10</mergeFactor>
<ramBufferSizeMB>32</ramBufferSizeMB>
<maxMergeDocs>2147483647</maxMergeDocs>
<maxFieldLength>10000000</maxFieldLength>
<unlockOnStartup>false</unlockOnStartup>
<reopenReaders>true</reopenReaders>
<deletionPolicy class="solr.SolrDeletionPolicy">
<str name="keepOptimizedOnly">false</str>
<str name="maxCommitsToKeep">1</str>
schema:
<field name="id" type="string" indexed="true" stored="true"
required="true"/>
<field name="ocr" type="CommonGramTest" indexed="true" stored="false"
required="true"/>
<field name="title" type="string" indexed="true" stored="true"
multiValued="true" required="true"/>
<field name="rights" type="sint" indexed="true" stored="true"
required="true"/>
<field name="author" type="string" indexed="true" stored="true"
multiValued="true"/>
<field name="date" type="string" indexed="true" stored="true"/>
Phil
hathitrust.org