You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2011/05/02 15:04:03 UTC

[jira] [Commented] (LUCENE-3054) add assert to sorts catch broken comparators in tests

    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027639#comment-13027639 ] 

Uwe Schindler commented on LUCENE-3054:
---------------------------------------

I investigated what happens here:

The problem is indeed quickSort, but not undernormal circumstances. The problem with quickSort (just google for stack overflow and quicksort) is that it only works fine for arrays with many values. Once you only have few distinct values and a large array, depending on the oreder it may happen that it splits into two subarrays for next iteration, where one is very large and the other only contains few items.

Attached is a patch, that shows the problem. It almost every time stack overflows. Also quicksort is very *slow* for this case.

This is exactly what happens on PhraseQuery: we only have very few distinct items and possibly a very huge array. To fix this, we should change PhraseQuery to use mergeSort instead. Mergesort is also much faster in this case, as it always splits the array in the center. So the number of iterations is limited.

For TermsHash/BytesRefHash its mostly also not a problem, as the values (the terms are 100% distict, as only the hash is sorted).

But there may still be the slight chance this messes up. I propose to change SorterTemplate to fall back to mergeSort once it checks that number of iterations grows e.g. > 20 (have to test a little bit).

I will change that issue to higher priority and we also need to backport to 3.1.

> add assert to sorts catch broken comparators in tests
> -----------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>         Attachments: LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org