You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Jack Krupansky <ja...@basetechnology.com> on 2012/06/03 20:18:15 UTC

What is the maximum document number?

Doing a little more research on document numbers, I had thought that the maximum document number was 2^30-1 or Integer.MAX_INT, but... I see that IndexReader.numDocs, maxDoc, and the corresponding IndexWriter methods return the number of documents as an int, so since document numbers start at zero, the number of documents is actually limited to 2^30-1, so the highest document number is limited to 2^30-1 minus another 1 or 2^30-2.

-- Jack Krupansky

Re: What is the maximum document number?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Double oops... it is Integer.MAX_VALUE, not Integer.MAX_INT.

-- Jack Krupansky

From: Jack Krupansky 
Sent: Sunday, June 03, 2012 5:11 PM
To: dev@lucene.apache.org 
Subject: Re: What is the maximum document number?

Oops, I indicated that Integer.MAX_INT was 2^30-1, but it is 2^31-1 or 2,147,483,647. So the largest document number in Lucene would be 2,147,483,646 and the largest number (count) of documents would be 2,147,483,647.

-- Jack Krupansky

From: Jack Krupansky 
Sent: Sunday, June 03, 2012 4:09 PM
To: dev@lucene.apache.org 
Subject: Re: What is the maximum document number?

The javadoc for IR.maxDoc refers to “largest possible document number”, but the word “possible” is confusing. Superficially it sounds like the largest document number that Lucene can ever assign, but really it is simply the “largest document number in the index at the moment, including deleted documents.”

The javadoc should probably simply say: numDocs = maxDocs - numDeletedDocs


-- Jack Krupansky

From: Uwe Schindler 
Sent: Sunday, June 03, 2012 3:47 PM
To: dev@lucene.apache.org 
Subject: Re: What is the maximum document number?

Hi,

In fact maxDoc is not the maximum, it is also a count. If no deletions are in an index, maxDoc==numDocs. That's unfortunately how it is, maybe we should rename that in 4.0.

Uwe
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de




Jack Krupansky <ja...@basetechnology.com> schrieb: 
  Doing a little more research on document numbers, I had thought that the maximum document number was 2^30-1 or Integer.MAX_INT, but... I see that IndexReader.numDocs, maxDoc, and the corresponding IndexWriter methods return the number of documents as an int, so since document numbers start at zero, the number of documents is actually limited to 2^30-1, so the highest document number is limited to 2^30-1 minus another 1 or 2^30-2.

  -- Jack Krupansky

Re: What is the maximum document number?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Oops, I indicated that Integer.MAX_INT was 2^30-1, but it is 2^31-1 or 2,147,483,647. So the largest document number in Lucene would be 2,147,483,646 and the largest number (count) of documents would be 2,147,483,647.

-- Jack Krupansky

From: Jack Krupansky 
Sent: Sunday, June 03, 2012 4:09 PM
To: dev@lucene.apache.org 
Subject: Re: What is the maximum document number?

The javadoc for IR.maxDoc refers to “largest possible document number”, but the word “possible” is confusing. Superficially it sounds like the largest document number that Lucene can ever assign, but really it is simply the “largest document number in the index at the moment, including deleted documents.”

The javadoc should probably simply say: numDocs = maxDocs - numDeletedDocs


-- Jack Krupansky

From: Uwe Schindler 
Sent: Sunday, June 03, 2012 3:47 PM
To: dev@lucene.apache.org 
Subject: Re: What is the maximum document number?

Hi,

In fact maxDoc is not the maximum, it is also a count. If no deletions are in an index, maxDoc==numDocs. That's unfortunately how it is, maybe we should rename that in 4.0.

Uwe
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de




Jack Krupansky <ja...@basetechnology.com> schrieb: 
  Doing a little more research on document numbers, I had thought that the maximum document number was 2^30-1 or Integer.MAX_INT, but... I see that IndexReader.numDocs, maxDoc, and the corresponding IndexWriter methods return the number of documents as an int, so since document numbers start at zero, the number of documents is actually limited to 2^30-1, so the highest document number is limited to 2^30-1 minus another 1 or 2^30-2.

  -- Jack Krupansky

Re: What is the maximum document number?

Posted by Jack Krupansky <ja...@basetechnology.com>.
The javadoc for IR.maxDoc refers to “largest possible document number”, but the word “possible” is confusing. Superficially it sounds like the largest document number that Lucene can ever assign, but really it is simply the “largest document number in the index at the moment, including deleted documents.”

The javadoc should probably simply say: numDocs = maxDocs - numDeletedDocs


-- Jack Krupansky

From: Uwe Schindler 
Sent: Sunday, June 03, 2012 3:47 PM
To: dev@lucene.apache.org 
Subject: Re: What is the maximum document number?

Hi,

In fact maxDoc is not the maximum, it is also a count. If no deletions are in an index, maxDoc==numDocs. That's unfortunately how it is, maybe we should rename that in 4.0.

Uwe
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de




Jack Krupansky <ja...@basetechnology.com> schrieb: 
  Doing a little more research on document numbers, I had thought that the maximum document number was 2^30-1 or Integer.MAX_INT, but... I see that IndexReader.numDocs, maxDoc, and the corresponding IndexWriter methods return the number of documents as an int, so since document numbers start at zero, the number of documents is actually limited to 2^30-1, so the highest document number is limited to 2^30-1 minus another 1 or 2^30-2.

  -- Jack Krupansky

Re: What is the maximum document number?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Hi,

In fact maxDoc is not the maximum, it is also a count. If no deletions are in an index, maxDoc==numDocs. That's unfortunately how it is, maybe we should rename that in 4.0.

Uwe
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de



Jack Krupansky <ja...@basetechnology.com> schrieb:

Doing a little more research on document numbers, I had thought that the maximum document number was 2^30-1 or Integer.MAX_INT, but... I see that IndexReader.numDocs, maxDoc, and the corresponding IndexWriter methods return the number of documents as an int, so since document numbers start at zero, the number of documents is actually limited to 2^30-1, so the highest document number is limited to 2^30-1 minus another 1 or 2^30-2.


-- Jack Krupansky