You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Andres de la Peña <ad...@stratio.com> on 2016/12/08 13:50:44 UTC

Searching in multiple indexes with more than 2147483519 documents

Hi all,

A Lucene index can't contain more than 2147483519 documents, so we want to
split a larger dataset in multiple indexes. However, it is not possible to
create a MultiReader to search in all the index partitions at a time:

Too many documents: composite IndexReaders cannot exceed 2147483519 but
readers have total maxDoc=2171401446


What do you think is the best way to search in several indexes containing
more than 2147483519 documents in total? Maybe searching in each index and
merging the results in a MemoryIndex/RAMIndex?

Thanks,

-- 
Andrés de la Peña

Vía de las dos Castillas, 33, Ática 4, 3ª Planta
28224 Pozuelo de Alarcón, Madrid
Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
<https://twitter.com/StratioBD>*

Re: Searching in multiple indexes with more than 2147483519 documents

Posted by Michael McCandless <lu...@mikemccandless.com>.
You can search each separately yourself and then use the TopDocs.merge
API to merge sort the results.  That API can handle > 2.1 B documents,
and each ScoreDoc hit references the shardIndex so you know which of
your indices to go back to e.g. to load stored fields.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Dec 8, 2016 at 8:50 AM, Andres de la Peña <ad...@stratio.com> wrote:
> Hi all,
>
> A Lucene index can't contain more than 2147483519 documents, so we want to
> split a larger dataset in multiple indexes. However, it is not possible to
> create a MultiReader to search in all the index partitions at a time:
>
> Too many documents: composite IndexReaders cannot exceed 2147483519 but
> readers have total maxDoc=2171401446
>
>
> What do you think is the best way to search in several indexes containing
> more than 2147483519 documents in total? Maybe searching in each index and
> merging the results in a MemoryIndex/RAMIndex?
>
> Thanks,
>
> --
> Andrés de la Peña
>
> Vía de las dos Castillas, 33, Ática 4, 3ª Planta
> 28224 Pozuelo de Alarcón, Madrid
> Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd
> <https://twitter.com/StratioBD>*

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org