You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Benson Margulies (JIRA)" <ji...@apache.org> on 2009/12/27 04:07:29 UTC

[jira] Created: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Replace org.apache.mahout.math.Sorting with code of clear provenance
--------------------------------------------------------------------

                 Key: MAHOUT-230
                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
             Project: Mahout
          Issue Type: Bug
          Components: Math
    Affects Versions: 0.3
            Reporter: Benson Margulies
             Fix For: 0.3


org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.

The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 

One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by Benson Margulies <bi...@gmail.com>.
Simple answer: the Harmony team didn't code as well as the Sun people did.

This is not my metier, so if someone else can suggest algorithmic
improvements ...

On Tue, Dec 29, 2009 at 8:06 AM, Robin Anil (JIRA) <ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795047#action_12795047 ]
>
> Robin Anil commented on MAHOUT-230:
> -----------------------------------
>
> I ran the SortingTest instead of 3.2 sec it now takes 5.2 seconds. I repeated the patch and rechecked. Any idea why the perf dip?  Notice the perf drop in the second block  i.e. after the line break in each block
>
> {code:xml|title=Original Output}
>
>  <testcase time="0.016" classname="org.apache.mahout.math.SortingTest" name="testBinarySearch"/>
>  <testcase time="0.015" classname="org.apache.mahout.math.SortingTest" name="testBinarySearchObjects"/>
>  <testcase time="0.032" classname="org.apache.mahout.math.SortingTest" name="testQuickSortBytes"/>
>  <testcase time="0.234" classname="org.apache.mahout.math.SortingTest" name="testQuickSortChars"/>
>  <testcase time="0.188" classname="org.apache.mahout.math.SortingTest" name="testQuickSortInts"/>
>  <testcase time="0.171" classname="org.apache.mahout.math.SortingTest" name="testQuickSortLongs"/>
>  <testcase time="0.204" classname="org.apache.mahout.math.SortingTest" name="testQuickSortShorts"/>
>  <testcase time="0.218" classname="org.apache.mahout.math.SortingTest" name="testQuickSortFloats"/>
>  <testcase time="0.235" classname="org.apache.mahout.math.SortingTest" name="testQuickSortDoubles"/>
>  <testcase time="0.062" classname="org.apache.mahout.math.SortingTest" name="testMergeSortBytes"/>
>
>  <testcase time="0.313" classname="org.apache.mahout.math.SortingTest" name="testMergeSortChars"/>
>  <testcase time="0.281" classname="org.apache.mahout.math.SortingTest" name="testMergeSortInts"/>
>  <testcase time="0.391" classname="org.apache.mahout.math.SortingTest" name="testMergeSortLongs"/>
>  <testcase time="0.25" classname="org.apache.mahout.math.SortingTest" name="testMergeSortShorts"/>
>  <testcase time="0.359" classname="org.apache.mahout.math.SortingTest" name="testMergeSortFloats"/>
>  <testcase time="0.328" classname="org.apache.mahout.math.SortingTest" name="testMergeSortDoubles"/>
> {code}
>
> {code:xml|title=After Patching}
>
>  <testcase time="0.031" classname="org.apache.mahout.math.SortingTest" name="testBinarySearch"/>
>  <testcase time="0" classname="org.apache.mahout.math.SortingTest" name="testBinarySearchObjects"/>
>  <testcase time="0" classname="org.apache.mahout.math.SortingTest" name="testQuickSortBytes"/>
>  <testcase time="0.234" classname="org.apache.mahout.math.SortingTest" name="testQuickSortChars"/>
>  <testcase time="0.234" classname="org.apache.mahout.math.SortingTest" name="testQuickSortInts"/>
>  <testcase time="0.172" classname="org.apache.mahout.math.SortingTest" name="testQuickSortLongs"/>
>  <testcase time="0.266" classname="org.apache.mahout.math.SortingTest" name="testQuickSortShorts"/>
>  <testcase time="0.235" classname="org.apache.mahout.math.SortingTest" name="testQuickSortFloats"/>
>  <testcase time="0.25" classname="org.apache.mahout.math.SortingTest" name="testQuickSortDoubles"/>
>  <testcase time="0.015" classname="org.apache.mahout.math.SortingTest" name="testMergeSortBytes"/>
>
>  <testcase time="0.594" classname="org.apache.mahout.math.SortingTest" name="testMergeSortChars"/>
>  <testcase time="0.61" classname="org.apache.mahout.math.SortingTest" name="testMergeSortInts"/>
>  <testcase time="0.781" classname="org.apache.mahout.math.SortingTest" name="testMergeSortLongs"/>
>  <testcase time="0.562" classname="org.apache.mahout.math.SortingTest" name="testMergeSortShorts"/>
>  <testcase time="0.578" classname="org.apache.mahout.math.SortingTest" name="testMergeSortFloats"/>
>  <testcase time="0.61" classname="org.apache.mahout.math.SortingTest" name="testMergeSortDoubles"/>
> {code}
>
>> Replace org.apache.mahout.math.Sorting with code of clear provenance
>> --------------------------------------------------------------------
>>
>>                 Key: MAHOUT-230
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>>             Project: Mahout
>>          Issue Type: Bug
>>          Components: Math
>>    Affects Versions: 0.3
>>            Reporter: Benson Margulies
>>            Assignee: Benson Margulies
>>             Fix For: 0.3
>>
>>         Attachments: replace-sorting.diff
>>
>>   Original Estimate: 72h
>>  Remaining Estimate: 72h
>>
>> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
>> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions.
>> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>

[jira] Updated: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Benson Margulies (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benson Margulies updated MAHOUT-230:
------------------------------------

    Attachment: replace-sorting.diff

Here is the new Sorting.java. The patch is, of course, illegible. Apply it and read the results.

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795055#action_12795055 ] 

Grant Ingersoll commented on MAHOUT-230:
----------------------------------------

I seem to recall Lucene having it's own merge sort, maybe we should look there?  Not saying it's faster, but might be worth looking at.

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Benson Margulies (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795060#action_12795060 ] 

Benson Margulies commented on MAHOUT-230:
-----------------------------------------

Heck, a read of the reference cited in the JDK 1.6 doc would prove rewarding, no doubt. Anyone else willing?

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Benson Margulies (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benson Margulies updated MAHOUT-230:
------------------------------------

    Status: Patch Available  (was: Open)

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>            Assignee: Benson Margulies
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795053#action_12795053 ] 

Grant Ingersoll commented on MAHOUT-230:
----------------------------------------

Committed revision 894390.

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll reassigned MAHOUT-230:
--------------------------------------

    Assignee: Grant Ingersoll  (was: Benson Margulies)

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795047#action_12795047 ] 

Robin Anil commented on MAHOUT-230:
-----------------------------------

I ran the SortingTest instead of 3.2 sec it now takes 5.2 seconds. I repeated the patch and rechecked. Any idea why the perf dip?  Notice the perf drop in the second block  i.e. after the line break in each block

{code:xml|title=Original Output} 

  <testcase time="0.016" classname="org.apache.mahout.math.SortingTest" name="testBinarySearch"/>
  <testcase time="0.015" classname="org.apache.mahout.math.SortingTest" name="testBinarySearchObjects"/>
  <testcase time="0.032" classname="org.apache.mahout.math.SortingTest" name="testQuickSortBytes"/>
  <testcase time="0.234" classname="org.apache.mahout.math.SortingTest" name="testQuickSortChars"/>
  <testcase time="0.188" classname="org.apache.mahout.math.SortingTest" name="testQuickSortInts"/>
  <testcase time="0.171" classname="org.apache.mahout.math.SortingTest" name="testQuickSortLongs"/>
  <testcase time="0.204" classname="org.apache.mahout.math.SortingTest" name="testQuickSortShorts"/>
  <testcase time="0.218" classname="org.apache.mahout.math.SortingTest" name="testQuickSortFloats"/>
  <testcase time="0.235" classname="org.apache.mahout.math.SortingTest" name="testQuickSortDoubles"/>
  <testcase time="0.062" classname="org.apache.mahout.math.SortingTest" name="testMergeSortBytes"/>

  <testcase time="0.313" classname="org.apache.mahout.math.SortingTest" name="testMergeSortChars"/>
  <testcase time="0.281" classname="org.apache.mahout.math.SortingTest" name="testMergeSortInts"/>
  <testcase time="0.391" classname="org.apache.mahout.math.SortingTest" name="testMergeSortLongs"/>
  <testcase time="0.25" classname="org.apache.mahout.math.SortingTest" name="testMergeSortShorts"/>
  <testcase time="0.359" classname="org.apache.mahout.math.SortingTest" name="testMergeSortFloats"/>
  <testcase time="0.328" classname="org.apache.mahout.math.SortingTest" name="testMergeSortDoubles"/>
{code} 

{code:xml|title=After Patching} 

  <testcase time="0.031" classname="org.apache.mahout.math.SortingTest" name="testBinarySearch"/>
  <testcase time="0" classname="org.apache.mahout.math.SortingTest" name="testBinarySearchObjects"/>
  <testcase time="0" classname="org.apache.mahout.math.SortingTest" name="testQuickSortBytes"/>
  <testcase time="0.234" classname="org.apache.mahout.math.SortingTest" name="testQuickSortChars"/>
  <testcase time="0.234" classname="org.apache.mahout.math.SortingTest" name="testQuickSortInts"/>
  <testcase time="0.172" classname="org.apache.mahout.math.SortingTest" name="testQuickSortLongs"/>
  <testcase time="0.266" classname="org.apache.mahout.math.SortingTest" name="testQuickSortShorts"/>
  <testcase time="0.235" classname="org.apache.mahout.math.SortingTest" name="testQuickSortFloats"/>
  <testcase time="0.25" classname="org.apache.mahout.math.SortingTest" name="testQuickSortDoubles"/>
  <testcase time="0.015" classname="org.apache.mahout.math.SortingTest" name="testMergeSortBytes"/>

  <testcase time="0.594" classname="org.apache.mahout.math.SortingTest" name="testMergeSortChars"/>
  <testcase time="0.61" classname="org.apache.mahout.math.SortingTest" name="testMergeSortInts"/>
  <testcase time="0.781" classname="org.apache.mahout.math.SortingTest" name="testMergeSortLongs"/>
  <testcase time="0.562" classname="org.apache.mahout.math.SortingTest" name="testMergeSortShorts"/>
  <testcase time="0.578" classname="org.apache.mahout.math.SortingTest" name="testMergeSortFloats"/>
  <testcase time="0.61" classname="org.apache.mahout.math.SortingTest" name="testMergeSortDoubles"/>
{code} 

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>            Assignee: Benson Margulies
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795048#action_12795048 ] 

Grant Ingersoll commented on MAHOUT-230:
----------------------------------------

I think we need to commit and then worry about performance.  The legal issues outweigh the performance issues at this point.  I'll commit and then we can open a new one for performance..

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Grant Ingersoll (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Grant Ingersoll updated MAHOUT-230:
-----------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Anyone can.

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Robin Anil (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795058#action_12795058 ] 

Robin Anil commented on MAHOUT-230:
-----------------------------------

What about hadoop?. I guess its their core operation.

http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/MergeSort.html

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Benson Margulies (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benson Margulies reassigned MAHOUT-230:
---------------------------------------

    Assignee: Benson Margulies

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>            Assignee: Benson Margulies
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Benson Margulies (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795054#action_12795054 ] 

Benson Margulies commented on MAHOUT-230:
-----------------------------------------

And it's all in the merge sorts.

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Posted by "Benson Margulies (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795666#action_12795666 ] 

Benson Margulies commented on MAHOUT-230:
-----------------------------------------

Grant, do you resolve or do I?

> Replace org.apache.mahout.math.Sorting with code of clear provenance
> --------------------------------------------------------------------
>
>                 Key: MAHOUT-230
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.3
>            Reporter: Benson Margulies
>            Assignee: Grant Ingersoll
>             Fix For: 0.3
>
>         Attachments: replace-sorting.diff
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions. 
> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.