You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Erick Erickson (JIRA)" <ji...@apache.org> on 2017/08/08 17:42:00 UTC
[jira] [Commented] (SOLR-11211) Too many documents, composite IndexReaders cannot exceed 2147483519

    [ https://issues.apache.org/jira/browse/SOLR-11211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16118722#comment-16118722 ] 

Erick Erickson commented on SOLR-11211:
---------------------------------------

bq: I wonder how SOLR allowed me to add more documents than what a single shard can take.

One possible scenario (and the Lucene guys please step in if this is off the wall)...

_segments_ have a base+offset for the internal ID. So segment 1 might have
base: 1,000,000
docs: 0-1,000

So as long as you're adding documents to Solr (actually Lucene) and _not_ opening searchers you can create segments forever.

composite IndexReaders look at all the segments and assemble a (conceptual) list of all the docs in the segment. So the segment above will have docs 1,000,000-1,001,000.

Plus note that numDocs isn't the actual total docs. maxDoc is the one that counts here, it includes deleted documents.

As far as recovering your data, this occurred to me, but I have not tested whether this idea will work.
1> copy 1/2 the segments to each of two new cores.
2> run CheckIndex with -fix. This will drop any "bad" segments, in this case I believe it will rewrite your segments file to only include the existing segments.
3> Examine both cores to see they're what you expect
4> run MERGEINDEX (https://cwiki.apache.org/confluence/display/solr/Merging+Indexes) to bring them back together.

It's worth a shot anyway. It's a band-aid, longer term you want to split this shard for a variety of reasons.

This is actually a Lucene-level limitation, and unlikely to be changed any time soon as it's a very large undertaking.

> Too many documents, composite IndexReaders cannot exceed 2147483519
> -------------------------------------------------------------------
>
>                 Key: SOLR-11211
>                 URL: https://issues.apache.org/jira/browse/SOLR-11211
>             Project: Solr
>          Issue Type: Task
>      Security Level: Public(Default Security Level. Issues are Public) 
>         Environment: Hadoop Centos6
>            Reporter: Wael
>
> I am running a single node Hadoop SOLR machine with 64 GB of ram.
> The issue is that I was using the machine successfully untill yesterday where I made a restart and one of the indexes I am working on wouldn't start giving the error :Too many documents, composite IndexReaders cannot exceed 2147483519". 
> I wonder how SOLR allowed me to add more documents than what a single shard can take. I need a solution to startup the index and I don't want to loose all the data as I only have a 2 week old backup. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org