You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Itay Adler <it...@walkme.com> on 2017/10/24 07:33:13 UTC

Increasing segment maxDoc limitation

Hey everyone,

We have a use-case for Solr+Lucene where we have a large amount of small
docments we index, so it performs quite well even when we reach the
document number limitation in Lucene. I was wondering if there are any
plans to increase the 2^31-1 doc limitation, and if not what are the things
to lookout for when applying this change?

Cheers,
Itay
-- 
Itay Adler

Re: Increasing segment maxDoc limitation

Posted by Adrien Grand <jp...@gmail.com>.
I don't think there are any short-term plans to remove this limitation, the
current answer to this problem is to partition your index into multiple
shards that can be searched independently. Then you can use TopDocs.merge
to merge results that come from your shards.

In my opinion, if we were to allow more than 2B documents limit in the
future, it would be easier to support it by still enforcing at most 2^31-1
documents per segment (which is convenient internally so that we can use
bitsets for live docs, byte[] for norms, etc.) but allowing indices to have
2^31 documents or more overall via multiple segments. 2^31 documents is
very likely to get you way past the maximum segment size of the merge
policy anyway, so there is not much value to supporting such large numbers
of documents on a per-segment basis.

Le mar. 24 oct. 2017 à 09:40, Itay Adler <it...@walkme.com> a écrit :

> Hey everyone,
>
> We have a use-case for Solr+Lucene where we have a large amount of small
> docments we index, so it performs quite well even when we reach the
> document number limitation in Lucene. I was wondering if there are any
> plans to increase the 2^31-1 doc limitation, and if not what are the things
> to lookout for when applying this change?
>
> Cheers,
> Itay
> --
> Itay Adler
>