You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Wael Kader <wa...@softech-lb.com> on 2017/08/07 15:41:03 UTC

IndexReaders cannot exceed 2 Billion

Hello,

I faced an issue that is making me go crazy.
I am running SOLR saving data on HDFS and I have a single node setup with
an index that has been running fine until today.
I know that 2 billion documents is too much on a single node but it has
been running fine for my requirements and it was pretty fast.

I restarted SOLR today and I am getting an error stating "Too many
documents, composite IndexReaders cannot exceed 2147483519.
The last backup I have is 2 weeks back and I really need the index to start
to get the data from the index.

Please help !
-- 
Regards,
Wael

Re: IndexReaders cannot exceed 2 Billion

Posted by Mike Drob <md...@apache.org>.
> I have no idea whether you can successfully recover anything from that
> index now that it has broken the hard limit.

Theoretically, I think it's possible with some very surgical edits.
However, I've tried to do this in the past and abandoned it. The code to
split the index needs to be able to open it first, so we reasoned that we'd
have no way to demonstrate correctness and at that point restoring from a
backup was the best option.

Maybe somebody smarter or more determined has a better experience.

Mike

On Tue, Aug 8, 2017 at 10:21 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 8/7/2017 9:41 AM, Wael Kader wrote:
> > I faced an issue that is making me go crazy.
> > I am running SOLR saving data on HDFS and I have a single node setup with
> > an index that has been running fine until today.
> > I know that 2 billion documents is too much on a single node but it has
> > been running fine for my requirements and it was pretty fast.
> >
> > I restarted SOLR today and I am getting an error stating "Too many
> > documents, composite IndexReaders cannot exceed 2147483519.
> > The last backup I have is 2 weeks back and I really need the index to
> start
> > to get the data from the index.
>
> You have run into what I think might be the only *hard* limit in the
> entire Lucene ecosystem.  Other limits can usually be broken with
> careful programming, but that one is set in stone.
>
> A Lucene index uses a 32-bit Java integer to track the internal document
> ID.  In Java, numeric variables are signed.  For that reason, an integer
> cannot exceed (2^31)-1.  That number is 2147483647.  It appears that
> Lucene cuts that off at a value that's smaller by 128.  Not sure why
> that is, but it's probably to prevent problems when a small offset is
> added to the value.
>
> SolrCloud is perfectly capable of running indexes with far more than two
> billion documents, but as Yago mentioned, the collection must be sharded
> for that to happen.
>
> I have no idea whether you can successfully recover anything from that
> index now that it has broken the hard limit.
>
> Thanks,
> Shawn
>
>

Re: IndexReaders cannot exceed 2 Billion

Posted by Shawn Heisey <ap...@elyograg.org>.
On 8/7/2017 9:41 AM, Wael Kader wrote:
> I faced an issue that is making me go crazy.
> I am running SOLR saving data on HDFS and I have a single node setup with
> an index that has been running fine until today.
> I know that 2 billion documents is too much on a single node but it has
> been running fine for my requirements and it was pretty fast.
>
> I restarted SOLR today and I am getting an error stating "Too many
> documents, composite IndexReaders cannot exceed 2147483519.
> The last backup I have is 2 weeks back and I really need the index to start
> to get the data from the index.

You have run into what I think might be the only *hard* limit in the
entire Lucene ecosystem.  Other limits can usually be broken with
careful programming, but that one is set in stone.

A Lucene index uses a 32-bit Java integer to track the internal document
ID.  In Java, numeric variables are signed.  For that reason, an integer
cannot exceed (2^31)-1.  That number is 2147483647.  It appears that
Lucene cuts that off at a value that's smaller by 128.  Not sure why
that is, but it's probably to prevent problems when a small offset is
added to the value.

SolrCloud is perfectly capable of running indexes with far more than two
billion documents, but as Yago mentioned, the collection must be sharded
for that to happen.

I have no idea whether you can successfully recover anything from that
index now that it has broken the hard limit.

Thanks,
Shawn


Re: IndexReaders cannot exceed 2 Billion

Posted by Yago Riveiro <ya...@gmail.com>.
You have the maximum number of docs in a single shard.

If I'm not wrong, the only solution is split the index in more shards (if you are running solrcloud mode).

--

/Yago Riveiro

On 7 Aug 2017, 16:48 +0100, Wael Kader <wa...@softech-lb.com>, wrote:
> Hello,
>
> I faced an issue that is making me go crazy.
> I am running SOLR saving data on HDFS and I have a single node setup with
> an index that has been running fine until today.
> I know that 2 billion documents is too much on a single node but it has
> been running fine for my requirements and it was pretty fast.
>
> I restarted SOLR today and I am getting an error stating "Too many
> documents, composite IndexReaders cannot exceed 2147483519.
> The last backup I have is 2 weeks back and I really need the index to start
> to get the data from the index.
>
> Please help !
> --
> Regards,
> Wael

IndexReaders cannot exceed 2 Billion

Posted by Wael Kader <wa...@softech-lb.com>.
> 
> Hello,
> 
> I am facing an issue on my live environment and I couldn’t find a solution yet.
> I am running SOLR saving data on HDFS and I have a single node setup with an index that has been running fine until today. 
> I know that 2 billion documents is too much on a single node but it has been running fine for my requirements and it was pretty fast.
> 
> I restarted SOLR today and I am getting an error stating "Too many documents, composite IndexReaders cannot exceed 2147483519.
> The last backup I have is 2 weeks back and I really need the index to start to get the data from the index. I can delete data and create a separate shard but I need it to be up so I can take the data.
> 
> Please help !
> -- 
> Regards,
> Wael