You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Brandon Iles <il...@uber.com> on 2016/03/25 00:34:54 UTC

Optimal number of segments for a static index

Hi Lucene users,

A quick version of my question is, why am I seeing higher performance for a
multi-segment index vs a single segment index?

I have a static index that's generated before serving begins, so realtime
updates and merges aren't an issue. I've been experimenting with ways to
increase system performance and discovered, surprisingly, that I get higher
average QPS with more than one segment. I've also discovered that the
highest performing number changes with the size of the index.

For example, on a 6.5G index, the optimal number was 4. For a 65G index it
was 8, and for a 109G index it was about 16. The difference in average qps
has been 15-35%, so it's significant.

We also use an EarlyTerminatingSortingCollector with a sorted index, and
I've verified that we are terminating early when appropriate. Given that
Lucene searches for the requested number of hits in each segment in
sequence, shouldn't the performance increase linearly with the segment
count?

The indexes were warmed before the tests began. I also set the heapsize
large enough to not be an issue and still left plenty of space for the FS
to cache the index in memory.

If you have any insights, it would be appreciated.

Thanks,
Brandon