You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Arun Rangarajan <ar...@gmail.com> on 2014/02/10 07:41:54 UTC

Is 'optimize' necessary for a 45-segment Solr 4.6 index?

I have a 28 GB Solr 4.6 index with 45 segments. Optimize failed with an
'out of memory' error. Is optimize really necessary, since I read that
lucene is able to handle multiple segments well now?

Re: Is 'optimize' necessary for a 45-segment Solr 4.6 index?

Posted by Arun Rangarajan <ar...@gmail.com>.
Dear Shawn,
Thanks for your reply. For now, I did merges in steps with maxSegments
param (using HOST:PORT/CORE/update?optimize=true&maxSegments=10). First I
merged the 45 segments to 10, and then from 10 to 5. (Merging from 5 to 2
again caused out-of-memory exception.) Now I have a 5-segment index with
all segments roughly of equal sizes. Will try using that and see if that is
good enough for us.


On Sun, Feb 9, 2014 at 11:22 PM, Shawn Heisey <so...@elyograg.org> wrote:

> On 2/9/2014 11:41 PM, Arun Rangarajan wrote:
> > I have a 28 GB Solr 4.6 index with 45 segments. Optimize failed with an
> > 'out of memory' error. Is optimize really necessary, since I read that
> > lucene is able to handle multiple segments well now?
>
> I have had indexes with more than 45 segments, because of the merge
> settings that I use.  My large index shards are about 16GB at the
> moment.  Out of memory errors are very rare because I use a fairly large
> heap, at 6GB for a machine that hosts three of these large shards.  When
> I was still experimenting with my memory settings, I did see occasional
> out of memory errors during normal segment merging.
>
> Increasing your heap size is pretty much required at this point.  I've
> condensed some very basic information about heap sizing here:
>
> http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>
> As for whether optimizing on 4.x is necessary: I do not have any hard
> numbers for you, but I can tell you that an optimized index does seem
> noticeably faster than one that is freshly built and has has a large
> number of relatively large segments.
>
> I optimize my index shards on an schedule, but it is relatively
> infrequent -- one large shard per night.  Most of the time what I have
> is one really large segment and a bunch of super-small segments, and
> that does not seem to suffer from performance issues compared to a fully
> optimized index.  The situation is different right after a fresh
> rebuild, which produces a handful of very large segments and a bunch of
> smaller segments of varying sizes.
>
> Interesting but probably irrelevant details:
>
> Although I don't use mergeFactor any more, the TieredMergePolicy
> settings that I use are equivalent to a mergeFactor of 35.  I chose this
> number back in the 1.4.1 days because it resulted in synchronicity
> between merges and lucene segment names when LogByteSizeMergePolicy was
> still in use.  Segments _0 through _z would be merged into segment _10,
> and so on.
>
> Thanks,
> Shawn
>
>

Re: Is 'optimize' necessary for a 45-segment Solr 4.6 index?

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/9/2014 11:41 PM, Arun Rangarajan wrote:
> I have a 28 GB Solr 4.6 index with 45 segments. Optimize failed with an
> 'out of memory' error. Is optimize really necessary, since I read that
> lucene is able to handle multiple segments well now?

I have had indexes with more than 45 segments, because of the merge
settings that I use.  My large index shards are about 16GB at the
moment.  Out of memory errors are very rare because I use a fairly large
heap, at 6GB for a machine that hosts three of these large shards.  When
I was still experimenting with my memory settings, I did see occasional
out of memory errors during normal segment merging.

Increasing your heap size is pretty much required at this point.  I've
condensed some very basic information about heap sizing here:

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

As for whether optimizing on 4.x is necessary: I do not have any hard
numbers for you, but I can tell you that an optimized index does seem
noticeably faster than one that is freshly built and has has a large
number of relatively large segments.

I optimize my index shards on an schedule, but it is relatively
infrequent -- one large shard per night.  Most of the time what I have
is one really large segment and a bunch of super-small segments, and
that does not seem to suffer from performance issues compared to a fully
optimized index.  The situation is different right after a fresh
rebuild, which produces a handful of very large segments and a bunch of
smaller segments of varying sizes.

Interesting but probably irrelevant details:

Although I don't use mergeFactor any more, the TieredMergePolicy
settings that I use are equivalent to a mergeFactor of 35.  I chose this
number back in the 1.4.1 days because it resulted in synchronicity
between merges and lucene segment names when LogByteSizeMergePolicy was
still in use.  Segments _0 through _z would be merged into segment _10,
and so on.

Thanks,
Shawn