You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Uwe Schindler (JIRA)" <ji...@apache.org> on 2010/10/22 23:46:20 UTC

[jira] Created: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
----------------------------------------------------------------------------------------------

                 Key: LUCENE-2719
                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
             Project: Lucene - Java
          Issue Type: Improvement
    Affects Versions: 3.1
            Reporter: Uwe Schindler
            Assignee: Uwe Schindler
             Fix For: 3.1, 4.0


This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:

- Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
- BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).

SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2719:
----------------------------------

    Attachment: LUCENE-2719-CollSupport.patch

After spending the evening with performance tests on BytesRefHash and Fuzzy automatons I cam to the following conclusion, finalized in this hopefully last patch:

- Automatons use very short, mostly presorted arrays. Quicksort is ineffective for them. Initial tests showed that even insertionSort is enough for most of the Transition arrays. As some automatons also contain very large Transition arrays, it showed, that then insertionSAort gets very slow. Quicksort gets better, but as the array is already sorted, mergesort beats them all. SorterTemplate.mergeSort contains a limit, so when array size is < 12 entries, it uses insertion sort for the sorting (also in later merge steps if the partitioned array gets < 12 entries).
schindlerMinimize and mccandlessDeterminize are now using mergesort.
- BytesRefHash gets about 10% speed improvement by the recent extension to SorterTemplate with setPivot/comparePivot abstract methods. This beats the old algorithm which is currently in trunk, as for the quicksort algorithm used, the swapping of entries in the mail loop always compares to the pivot value. If BytesRefHash needs to resolve this values every time, it gets slow. The new patch improves a modified TestBytesRefHash.testSort for perf testing by 11% (runs with -Xbatch -server -Xmx1024m, Java 1.5 on my computer in 12.5 secs on trunk, 11.1 secs with this patch):

{code}
public void testSortPerf() {
  long start = System.currentTimeMillis();
  BytesRef ref = new BytesRef();
  for (int j = 0; j < 200 * RANDOM_MULTIPLIER; j++) {
    for (int i = 0; i < 1797; i++) {
      String str;
      do {
        str = _TestUtil.randomRealisticUnicodeString(random, 1000);
      } while (str.length() == 0);
      ref.copy(str);
      hash.add(ref);
    }
    hash.sort(BytesRef.getUTF8SortedAsUTF16Comparator());
    hash.clear();
    hash.reinit();
  }
  System.out.println("time: "+(System.currentTimeMillis()-start));
}
{code}

I will commit this patch, which now also makes insertionSort public in SorterTemplate, ArrayUtil and CollectionUtil tomorrow. I tend to also commit this to 3.x (merged to BytesRefHash-similar class from 3.x). This is why the CHANGES.txt removes the SorterTemplate removal message (may need to be modified, because SorterTemplate changed API). If we will only commit to trunk, CHANGES would keep unchanged.

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924202#action_12924202 ] 

Michael McCandless commented on LUCENE-2719:
--------------------------------------------

I tested indexing first 10M wikipedia 1KB docs  and running various queries and found no real perf change, or at least less than the noise in the test...

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924051#action_12924051 ] 

Uwe Schindler commented on LUCENE-2719:
---------------------------------------

Using this class we can look for more useless quickSort code duplication. One is e.g. in DocFieldProcessorPerThread. Maybe more of them.

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2719:
----------------------------------

    Attachment:     (was: LUCENE-2719-CollSupport.patch)

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2719:
----------------------------------

    Attachment: LUCENE-2719-CollSupport.patch

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2719:
----------------------------------

    Attachment: LUCENE-2719.patch

New patch:
- Cleaned up (removed useless imports)
- Added test that verifies that mergeSort() is stable - this is now validated and we can use mergeSort() in ConjunctionScorer
- Further test optimization, so also totally reversed arrays get sorted correctly (special case)

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Issue Comment Edited: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925350#action_12925350 ] 

Uwe Schindler edited comment on LUCENE-2719 at 10/27/10 8:29 AM:
-----------------------------------------------------------------

After Robert mentioned the strange comparator in the above benchmark:
It is just a leftover from the original testSort() test which needed that special order, because it compared the sorted BytesRefHash using a TreeSet of UTF16 strings.
For the benchmark the comparator has no real effect.

      was (Author: thetaphi):
    After Robert mentioned the strange comparator in the above patch:
It is just a leftover from the original testSort() test which needed that special order, because it compared the sorted BytesRefHash using a TreeSet of UTF16 strings.
For the benchamrk the comparator has no real effect.
  
> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Resolved: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler resolved LUCENE-2719.
-----------------------------------

    Resolution: Fixed

Committed 3.x branch revision: 1028042

Thanks to all for performance testing!

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-final-3x.patch, LUCENE-2719-final-trunk.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2719:
----------------------------------

    Attachment: LUCENE-2719-CollSupport.patch
                LUCENE-2719.patch

Latest updates.

There is also an additional patch which provides CollectionUtil, now also supporting in-place collection sorts which is much more perfromant for smaller collections. Collections.sort() in JDK does copy the List into array and calls Arrays.sort() which itsself does clone the array and after that copies the arraycontents using a ListIterator back to the List.

Before committing this, can somebody look into the o.a.l.index package, because I replaced some sorts for field names there. For commit points i used MergeSort, as I am not sure if it should be stable.

So just a confirmation, if we need stable sort for Indexer and Commit points would be fine.

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719-CollSupport.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Yonik Seeley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924068#action_12924068 ] 

Yonik Seeley commented on LUCENE-2719:
--------------------------------------

bq. This also applies to Solr (Yonik?).

Yep - Solr also has it's own version of quicksort - PrimUtils.sort() to deal with sorting indexes (an int[]) instead of objects (parallel array sorting).

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2719:
----------------------------------

    Attachment: LUCENE-2719-final-3x.patch

filterdiff'ed patch for 3.x branch - we need that for commit mails, too. The changes in BytesRefHash are merged over to TermsHashPerField. This patch also removes useless synchronization!

After this also 3.x gets the imporved terms sorting and reduced code duplication.

I will commit soon.

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-final-3x.patch, LUCENE-2719-final-trunk.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2719:
----------------------------------

    Attachment: LUCENE-2719.patch

The quickSort in DocFieldProcessorPerThread also converted to ArrayUtil.quickSort().

Yonik: I will take care of PrimUtils!

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2719:
----------------------------------

    Attachment: LUCENE-2719.patch

Attached you find the patch. Robert offered to do benchmarks with Automaton.

The patch can be applied to a clean checkout, you no longer need to svn copy old SorterTemplate, as this is a almost complete rewrite.

This patch removes the CHANGES.txt entry for 3.x, as it readds the class. If we don't merge this to 3.x, the CHANGES should be reverted. As Lucene uses Arrays.sort(Object[]) which is slow at other places, I recommend to add it also to 3.x.

Please test the stuff with large -Dtests.multiplier! Maybe also verify my modified quickSort!

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2719:
----------------------------------

    Attachment: LUCENE-2719-CollSupport.patch

Robert tested yesterday and found out that SorterTemplate.quickSort is not as efficient as it could be. The general problem is:
- Quicksort needs the value of the pivot/partition element and the main sorting step compares this single value quite often
- For our in-place algorithm that only used swap(i,j) and compare(i,j), the main loop's swap statements needed an extra check that not the pivot index is swapped and so the pivot changes suddenly. Because of this when the index of the pivot is swapped, the pivot index value needed to be updated.

I changed SorterTemplate to look more like FieldComparator known from search. It now has not only swap(index1,index2) and compare(index1,index2), it also gets setPivot(index) [stores index' value as pivot] and comparePivot(index) [compares given index' value with previously stored pivot value]. Now the quicksort algorithm is identical to the one seen everywhere in Lucene before. We can now also implement the optimized one from harmony also seen in Solr's PrimUtil. I will look into this, if it makes sense (it makes not always sense as comparing and swapping is more intensive for non-native values!).

This has also some improvements to BytesRefHash, as there are less de-references of BytesRefs, because the main quickSort loop only compares an index with the in setPivot dereferenced BytesRefs. Before it did this on every compare step!

Robert: Can you supply your benchmark? Or test again :-)

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925414#action_12925414 ] 

Uwe Schindler commented on LUCENE-2719:
---------------------------------------

Committed trunk revision: 1027998

Now working on 3.x

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-final-trunk.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924162#action_12924162 ] 

Michael McCandless commented on LUCENE-2719:
--------------------------------------------

This is great Uwe!

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2719:
----------------------------------

    Attachment: LUCENE-2719.patch

Modified all sorts to use >>>1 instead /2 (have no real opinion about that).

bq. Yep - Solr also has it's own version of quicksort - PrimUtils.sort() to deal with sorting indexes (an int[]) instead of objects (parallel array sorting).

According to java.util.Arrays javadoc: The sorting algorithm is a tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function", Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November 1993). This algorithm offers n*log(n) performance on many data sets that cause other quicksorts to degrade to quadratic performance. 

If I change to SorterTemplate, we will degrade to good old quicksort. We could also upgrade SorterTemplate to use this algo, but I am not sure if thats easy because SorterTemplate only allows swap(index, index) and compare(index, index). But we cannot retrieve e.g. the pivot value. This was also one problem in porting BytesRefHash's quicksort, as it used the value of the pivot (this is why there is an additional check below the commented out assert in the current patch's algo).

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Commented: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925350#action_12925350 ] 

Uwe Schindler commented on LUCENE-2719:
---------------------------------------

After Robert mentioned the strange comparator in the above patch:
It is just a leftover from the original testSort() test which needed that special order, because it compared the sorted BytesRefHash using a TreeSet of UTF16 strings.
For the benchamrk the comparator has no real effect.

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


[jira] Updated: (LUCENE-2719) Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/LUCENE-2719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Uwe Schindler updated LUCENE-2719:
----------------------------------

    Attachment: LUCENE-2719-final-trunk.patch

Final patch, will get committed... now. It adds some contrib changes and changes.txt/notice.txt and javadocs.

> Re-add SorterTemplate and use it to provide fast ArraySorting and replace BytesRefHash sorting
> ----------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2719
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.1
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1, 4.0
>
>         Attachments: LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-CollSupport.patch, LUCENE-2719-final-trunk.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch, LUCENE-2719.patch
>
>
> This patch adds back an optimized and rewritten SorterTemplate back to Lucene (removed after release of 3.0). It is of use for several components:
> - Automaton: Automaton needs to sort States and other things. Using Arrays.sort() is slow, because it clones internally to ensure stable search. This component is much faster. This patch adds Arrays.sort() replacements in ArrayUtil that work with natural order or using a Comparator<?>. You can choose between quickSort and mergeSort.
> - BytesRefHash uses another QuickSort algorithm without insertionSort for very short ord arrays. This class uses SorterTemplate to provide the same with insertionSort fallback in a very elegant way. Ideally this class can be used everywhere, where the sort algorithm needs to be separated from the underlying data and you can implement a swap() and compare() function (that get slot numbers instead of real values). This also applies to Solr (Yonik?).
> SorterTemplate provides quickSort and mergeSort algorithms. Internally for short arrays, it automatically chooses insertionSort (like JDK's Arrays). The quickSort algorith was copied modified from old BytesRefHash. This new class only shares MergeSort with the original CGLIB SorterTemplate, which is no longer maintained.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org