You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Ahmet Arslan <io...@yahoo.com.INVALID> on 2015/05/29 14:18:39 UTC

IllegalArgumentException: docID must be >= 0 and < maxDoc=48736112 (got docID=2147483647)

Hello List,

When a similarity returns NEGATIVE_INFINITY, hits[i].doc becomes 2147483647.
Thus, exception is thrown in the following code:

for (int i = 0; i < hits.length; i++) {
int docId = hits[i].doc;
Document doc = searcher.doc(docId);
}

I know it is an awkward to return infinity (comes from log(0)), but exception looks like equally 
awkward and uniformative.

Do you think is this something improvable? Can we do better handling here?
 
Thanks,
Ahmet

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IllegalArgumentException: docID must be >= 0 and < maxDoc=48736112 (got docID=2147483647)

Posted by Ahmet Arslan <io...@yahoo.com.INVALID>.
Hi Robert,

Great info. I prevented corner cases in similarities 
that were producing NaN or Negative Infinity scores.

All is well with -ea now.

Thanks,
Ahmet



On Friday, May 29, 2015 3:32 PM, Robert Muir <rc...@gmail.com> wrote:
Hi Ahmet,

Its due to the use of sentinel values by your collector in its
priority queue by default.

TopScoreDocCollector warns about this, and if you turn on assertions
(-ea) you will hit them in your tests:

* <p><b>NOTE</b>: The values {@link Float#NaN} and
* {@link Float#NEGATIVE_INFINITY} are not valid scores.  This
* collector will not properly collect hits with such
* scores.
*/
public abstract class TopScoreDocCollector extends TopDocsCollector<ScoreDoc> {

I don't think a fix is simple, I only know of the following ideas:
* somehow sneaky use of NaN as sentinels instead of -Inf, to allow
-Inf to be used. It seems a bit scary!
* remove the sentinels optimization. I am not sure if collectors could
easily have the same performance without them.

To me, such scores seem always undesirable and only bugs, and the
current assertions are a good tradeoff.


On Fri, May 29, 2015 at 8:18 AM, Ahmet Arslan <io...@yahoo.com.invalid> wrote:
> Hello List,
>
> When a similarity returns NEGATIVE_INFINITY, hits[i].doc becomes 2147483647.
> Thus, exception is thrown in the following code:
>
> for (int i = 0; i < hits.length; i++) {
> int docId = hits[i].doc;
> Document doc = searcher.doc(docId);
> }
>
> I know it is an awkward to return infinity (comes from log(0)), but exception looks like equally
> awkward and uniformative.
>
> Do you think is this something improvable? Can we do better handling here?
>
> Thanks,
> Ahmet
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IllegalArgumentException: docID must be >= 0 and < maxDoc=48736112 (got docID=2147483647)

Posted by Robert Muir <rc...@gmail.com>.
Hi Ahmet,

Its due to the use of sentinel values by your collector in its
priority queue by default.

TopScoreDocCollector warns about this, and if you turn on assertions
(-ea) you will hit them in your tests:

 * <p><b>NOTE</b>: The values {@link Float#NaN} and
 * {@link Float#NEGATIVE_INFINITY} are not valid scores.  This
 * collector will not properly collect hits with such
 * scores.
 */
public abstract class TopScoreDocCollector extends TopDocsCollector<ScoreDoc> {

I don't think a fix is simple, I only know of the following ideas:
* somehow sneaky use of NaN as sentinels instead of -Inf, to allow
-Inf to be used. It seems a bit scary!
* remove the sentinels optimization. I am not sure if collectors could
easily have the same performance without them.

To me, such scores seem always undesirable and only bugs, and the
current assertions are a good tradeoff.


On Fri, May 29, 2015 at 8:18 AM, Ahmet Arslan <io...@yahoo.com.invalid> wrote:
> Hello List,
>
> When a similarity returns NEGATIVE_INFINITY, hits[i].doc becomes 2147483647.
> Thus, exception is thrown in the following code:
>
> for (int i = 0; i < hits.length; i++) {
> int docId = hits[i].doc;
> Document doc = searcher.doc(docId);
> }
>
> I know it is an awkward to return infinity (comes from log(0)), but exception looks like equally
> awkward and uniformative.
>
> Do you think is this something improvable? Can we do better handling here?
>
> Thanks,
> Ahmet
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org