You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2011/04/29 19:47:03 UTC

[jira] [Created] (LUCENE-3054) add assert to sorts catch broken comparators in tests

add assert to sorts catch broken comparators in tests
-----------------------------------------------------

                 Key: LUCENE-3054
                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
             Project: Lucene - Java
          Issue Type: Task
            Reporter: Robert Muir
         Attachments: LUCENE-3054.patch

Looking at Otis's sort problem on the mailing list, he said:
{noformat}
* looked for other places where this call is made - found it in
MultiPhraseQuery$MultiPhraseWeight and changed that call from
ArrayUtil.quickSort to ArrayUtil.mergeSort
* now we no longer see SorterTemplate.quickSort in deep recursion when we do a
thread dump
{noformat}

I thought this was interesting because PostingsAndFreq's comparator
looks like it needs a tiebreaker.

I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-3054.
-----------------------------------

    Resolution: Fixed

Committed trunk revision: 1099041
Merged 3.x revision: 1099045
Merged 3.1 revision: 1099046

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-dynamic.patch, LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) add assert to sorts catch broken comparators in tests

Posted by "Otis Gospodnetic (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Otis Gospodnetic updated LUCENE-3054:
-------------------------------------

    Affects Version/s: 3.1

Btw. this is with Lucene 3.1
For full thread: http://search-lucene.com/m/ytANA59Q9G1


> add assert to sorts catch broken comparators in tests
> -----------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>         Attachments: LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Assigned] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler reassigned LUCENE-3054:
-------------------------------------

    Assignee: Uwe Schindler

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027702#comment-13027702 ] 

Uwe Schindler commented on LUCENE-3054:
---------------------------------------

Committed trunk revision: 1098633

Now merging...

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13028125#comment-13028125 ] 

Michael McCandless commented on LUCENE-3054:
--------------------------------------------

Patch looks good!  I like the 2*log_2(N) dynamic cutover; this means we can tolerate somewhat lopsided QS recursion and remain using QS.

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-dynamic.patch, LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027662#comment-13027662 ] 

Uwe Schindler commented on LUCENE-3054:
---------------------------------------

Due to the realtime merge (LUCENE-3023), suddenly DocFieldProcessor got a reincarnation of quicksort again... will remove, too

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027774#comment-13027774 ] 

Uwe Schindler commented on LUCENE-3054:
---------------------------------------

Dawid:
There are two problems we have seen with native sort:
- it copies the array/collection always first, this caused slowdown for lots of places especiall in automaton - so it never sorts in plcace
- we sometimes need to sort multiple arrays in parallel, one as sort "key" -> especially in TermsHash/BytesRefHash. This is where SorterTemplate comes into the game: it supports separate swap(i,j) and compare(i,j) operations.

Uwe

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027730#comment-13027730 ] 

Michael McCandless commented on LUCENE-3054:
--------------------------------------------

Also, I think PQ.PostingsAndFreq.compare is still able to return ties, if the app puts the same term at the same position (which is a silly thing to do... but, still possible).

I think instead of disambiguating by Term, we should disambiguate by ord (ie, position of this term in the array of the query itself), since that can never be the same for entries in the array?

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027643#comment-13027643 ] 

Uwe Schindler commented on LUCENE-3054:
---------------------------------------

As quicksort gets insanely slow when these type of data gets sorted, this also explains Otis' slowdown.

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Priority: Critical
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3054:
----------------------------------

    Attachment: LUCENE-3054.patch

Sorry, the safety net is only needed at 40 (from my tests), before it may affect BytesRefHash performance.

I will commit later!

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) add assert to sorts catch broken comparators in tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3054:
--------------------------------

    Attachment: LUCENE-3054.patch

really ugly prototype... i expect the generics/sort policeman will want to jump in here anyway :)

but it does catch that problem:
{noformat}
    [junit] Testsuite: org.apache.lucene.index.TestCodecs
    [junit] Testcase: testSepPositionAfterMerge(org.apache.lucene.index.TestCodecs):    FAILED
    [junit] insane comparator for: org.apache.lucene.search.PhraseQuery$PostingsAndFreq
{noformat}

> add assert to sorts catch broken comparators in tests
> -----------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>            Reporter: Robert Muir
>         Attachments: LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3054:
----------------------------------

    Fix Version/s: 4.0
                   3.2
                   3.1.1

Set fix versions (also backport to 3.1.1, as its serious for some large PhraseQueries and a serious slowdown then).

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027948#comment-13027948 ] 

Uwe Schindler commented on LUCENE-3054:
---------------------------------------

Studying the C++ STL code showed that they use 2 * log2(n) as depth limit. I implemented that. It showed that for the most cases in Lucene (BytesRefHash), it uses quicksort (so no change to performance). The other cases use already mergeSort and the "bad" test in TestArrayUtil switches sucessfully to mergeSort.

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-dynamic.patch, LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027892#comment-13027892 ] 

Uwe Schindler commented on LUCENE-3054:
---------------------------------------

{quote}
Maybe we leave our QS as is (except, changing the 40 to be dynamic
depending on input length), noting that you should not use it if your
comparator does not break ties, and even if it does there are still
risks because of potentially bad pivot selection?
{quote}

That looks like this: http://en.wikipedia.org/wiki/Introsort

We only need a good recursion depth where to switch!

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Reopened] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reopened LUCENE-3054:
----------------------------------------


Reopening so we can discuss things further...:

QuickSort is dangerous!  Yet, it's definitely faster than MergeSort
for some cases (~20% faster when sorting terms for writing segment, in
quick test I ran on Wikipedia content).

So the core issue is we should not use QS when there's a risk of any
ties, because in that case it can run really slowly or hit infinite
recursion.

And we (well, Otis; thank you!) found one such place today (where
MultiPhraseQuery sorts its terms) where we could have many ties and
thus run very slowly / hit stack overflow.

I appreciate the motivation for the "safety net", but, it makes me
nervous... because, say we had done this a few months back... then
Otis likely would not have reported the issue?  Ie, the
MultiPhraseQuery would run slowly... which could evade detection
(people may just think it's slow).

I prefer brittle fails over silent slowdowns because the brittle fail
gets your attention and you get a real fix in.  Silent slowdowns evade
detection.  Sort of like the difference between a virus and
spyware...

Also, what's preventing us from accidentally using QS somewhere in the
future, where we shouldn't?  What's going to catch us?

Robert's first patch would catch this and protect us going forward?

Or, maybe we could strengthen that approach and "assert cmp != 0"
inside QS (ie, no ties are allowed to be passed to QS)?

Though, using asserts only is risky, because it could be the
comparator may return 0, but it's just that none of our test cases
tickled it.

Maybe instead we could do this in a type-safe way: make a new
NoTiesComparator whose compare method can only return LESS_THAN or
GREATER_THAN?  And then QS would require NoTiesComparator.  Could that
work?


> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3054:
----------------------------------

    Attachment: LUCENE-3054-dynamic.patch

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-dynamic.patch, LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027780#comment-13027780 ] 

Dawid Weiss commented on LUCENE-3054:
-------------------------------------

Thanks Uwe, I didn't know about it. Still, the algorithm folks developing OpenJDK have implemented is public, so an improvement can be filed -- maybe somebody will find the time to implement it in a version suitable for Lucene.

http://en.wikipedia.org/wiki/Timsort

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Dawid Weiss (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027772#comment-13027772 ] 

Dawid Weiss commented on LUCENE-3054:
-------------------------------------

I'm sure many of you know this, but there is a new implementation of mergesort in java.util.Collections -- it is based on a few clever heuristics (so it is a merge sort, only a finely tuned one) and has been ported/ partially inspired by the sort in Python as far as I recall.

Maybe it'd be sensible to compare against this and see what happens. I know Lucene/Solr would rather have its own implementation so that it doesn't rely on the standard library, but in my benchmarks the implementation in Collections.sort() was hard to beat...

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3054:
----------------------------------

    Attachment: LUCENE-3054.patch

Better test that fails faster in case of quickSort bug

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) add assert to sorts catch broken comparators in tests

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027639#comment-13027639 ] 

Uwe Schindler commented on LUCENE-3054:
---------------------------------------

I investigated what happens here:

The problem is indeed quickSort, but not undernormal circumstances. The problem with quickSort (just google for stack overflow and quicksort) is that it only works fine for arrays with many values. Once you only have few distinct values and a large array, depending on the oreder it may happen that it splits into two subarrays for next iteration, where one is very large and the other only contains few items.

Attached is a patch, that shows the problem. It almost every time stack overflows. Also quicksort is very *slow* for this case.

This is exactly what happens on PhraseQuery: we only have very few distinct items and possibly a very huge array. To fix this, we should change PhraseQuery to use mergeSort instead. Mergesort is also much faster in this case, as it always splits the array in the center. So the number of iterations is limited.

For TermsHash/BytesRefHash its mostly also not a problem, as the values (the terms are 100% distict, as only the hash is sorted).

But there may still be the slight chance this messes up. I propose to change SorterTemplate to fall back to mergeSort once it checks that number of iterations grows e.g. > 20 (have to test a little bit).

I will change that issue to higher priority and we also need to backport to 3.1.

> add assert to sorts catch broken comparators in tests
> -----------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>         Attachments: LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Resolved] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-3054.
-----------------------------------

    Resolution: Fixed

Merged 3.x revision: 1098639
Merged 3.1 revision: 1098641

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3054:
----------------------------------

    Attachment: LUCENE-3054.patch

Here the patch that combines Robert's optimization for PhraseQuery (term with lower docFreq will also have less positions) and the safety for quickSort at all.

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3054:
----------------------------------

    Attachment: LUCENE-3054-dynamic.patch

Here a patch which implements what introsort does: if the depth of recursion is >75% of log2(n), switch to mergeSort.

Also this patch moves all remaining quickSort calls to mergeSort on search side, where the comparators are not good. A few remaining ones in indexer keep alive, but those are all unique sets of terms or field names (needs some more review tomorrow).

Mike: What do you think, maybe you can do some benchmarking?

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-dynamic.patch, LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027640#comment-13027640 ] 

Robert Muir commented on LUCENE-3054:
-------------------------------------

{quote}
I propose to change SorterTemplate to fall back to mergeSort once it checks that number of iterations grows e.g. > 20 (have to test a little bit).
{quote}

I like the idea of some "guard" here to prevent the stack overflow, and hopefully keep the quickSort performance for the places where we know its better than mergesort.


> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Priority: Critical
>         Attachments: LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3054:
----------------------------------

    Priority: Critical  (was: Major)
     Summary: SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays  (was: add assert to sorts catch broken comparators in tests)

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Priority: Critical
>         Attachments: LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) add assert to sorts catch broken comparators in tests

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated LUCENE-3054:
--------------------------------

    Attachment: LUCENE-3054.patch

i expanded the patch to all the sorts, just to find all the wierd sorting/comparators going on.

it also finds some false positives, ones that are documented as inconsistent with equals, ones in tests, etc.

but we can at least look into the ones it finds.

> add assert to sorts catch broken comparators in tests
> -----------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>         Attachments: LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3054:
----------------------------------

    Attachment:     (was: LUCENE-3054-dynamic.patch)

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-dynamic.patch, LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3054:
----------------------------------

    Attachment: LUCENE-3054-stackoverflow.patch

Patch that shows the issue.

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Priority: Critical
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Updated] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-3054:
----------------------------------

    Attachment: LUCENE-3054.patch

Final patch.

After some discussion with robert: The use of QuickSort is fine after the comparator was fixed to not only sort by docFreq.

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] [Commented] (LUCENE-3054) SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-3054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027808#comment-13027808 ] 

Michael McCandless commented on LUCENE-3054:
--------------------------------------------

So, there are two known improvements to our QS, to try to avoid the O(N^2)
worst-case, both from Robert Sedgewick.

First, it's better to select median of low/mid/high as the pivot
(http://en.wikipedia.org/wiki/Quicksort#Choice_of_pivot).  Second, we
should handle "equal" values better
(http://www.angelfire.com/pq/jamesbarbetti/articles/sorting/001_QuicksortIsBroken.htm#Duplicates).

See also Lucy's nice QS impl:

  http://svn.apache.org/viewvc/incubator/lucy/trunk/core/Lucy/Util/SortUtils.c?revision=1098445&view=markup#l331

which I think addresses the above two issues, and goes even further
(eq-to-pivot values are explicitly "moved to the middle" and then not
recursed on).

The thing is, fixing these will make our QS more "general", at the
expense of some added cost for the cases we know work fine today (eg
sorting terms before flushing a segment).

Maybe we leave our QS as is (except, changing the 40 to be dynamic
depending on input length), noting that you should not use it if your
comparator does not break ties, and even if it does there are still
risks because of potentially bad pivot selection?

Or, maybe we remove QS always use MS?  Yes, there's a hit to the sort
when flushing the segment, but this is a tiny cost compared to the
rest of segment flushing...

Separately we can look into whether the tool timsort is faster for
sorting terms for flush....

> SorterTemplate.quickSort stack overflows on broken comparators that produce only few disticnt values in large arrays
> --------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-3054
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3054
>             Project: Lucene - Java
>          Issue Type: Task
>    Affects Versions: 3.1
>            Reporter: Robert Muir
>            Assignee: Uwe Schindler
>            Priority: Critical
>             Fix For: 3.1.1, 3.2, 4.0
>
>         Attachments: LUCENE-3054-stackoverflow.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch, LUCENE-3054.patch
>
>
> Looking at Otis's sort problem on the mailing list, he said:
> {noformat}
> * looked for other places where this call is made - found it in
> MultiPhraseQuery$MultiPhraseWeight and changed that call from
> ArrayUtil.quickSort to ArrayUtil.mergeSort
> * now we no longer see SorterTemplate.quickSort in deep recursion when we do a
> thread dump
> {noformat}
> I thought this was interesting because PostingsAndFreq's comparator
> looks like it needs a tiebreaker.
> I think in our sorts we should add some asserts to try to catch some of these broken comparators.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org