You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Benson Margulies <bi...@gmail.com> on 2009/12/29 14:23:20 UTC

Re: [jira] Commented: (MAHOUT-230) Replace org.apache.mahout.math.Sorting with code of clear provenance

Simple answer: the Harmony team didn't code as well as the Sun people did.

This is not my metier, so if someone else can suggest algorithmic
improvements ...

On Tue, Dec 29, 2009 at 8:06 AM, Robin Anil (JIRA) <ji...@apache.org> wrote:
>
>    [ https://issues.apache.org/jira/browse/MAHOUT-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795047#action_12795047 ]
>
> Robin Anil commented on MAHOUT-230:
> -----------------------------------
>
> I ran the SortingTest instead of 3.2 sec it now takes 5.2 seconds. I repeated the patch and rechecked. Any idea why the perf dip?  Notice the perf drop in the second block  i.e. after the line break in each block
>
> {code:xml|title=Original Output}
>
>  <testcase time="0.016" classname="org.apache.mahout.math.SortingTest" name="testBinarySearch"/>
>  <testcase time="0.015" classname="org.apache.mahout.math.SortingTest" name="testBinarySearchObjects"/>
>  <testcase time="0.032" classname="org.apache.mahout.math.SortingTest" name="testQuickSortBytes"/>
>  <testcase time="0.234" classname="org.apache.mahout.math.SortingTest" name="testQuickSortChars"/>
>  <testcase time="0.188" classname="org.apache.mahout.math.SortingTest" name="testQuickSortInts"/>
>  <testcase time="0.171" classname="org.apache.mahout.math.SortingTest" name="testQuickSortLongs"/>
>  <testcase time="0.204" classname="org.apache.mahout.math.SortingTest" name="testQuickSortShorts"/>
>  <testcase time="0.218" classname="org.apache.mahout.math.SortingTest" name="testQuickSortFloats"/>
>  <testcase time="0.235" classname="org.apache.mahout.math.SortingTest" name="testQuickSortDoubles"/>
>  <testcase time="0.062" classname="org.apache.mahout.math.SortingTest" name="testMergeSortBytes"/>
>
>  <testcase time="0.313" classname="org.apache.mahout.math.SortingTest" name="testMergeSortChars"/>
>  <testcase time="0.281" classname="org.apache.mahout.math.SortingTest" name="testMergeSortInts"/>
>  <testcase time="0.391" classname="org.apache.mahout.math.SortingTest" name="testMergeSortLongs"/>
>  <testcase time="0.25" classname="org.apache.mahout.math.SortingTest" name="testMergeSortShorts"/>
>  <testcase time="0.359" classname="org.apache.mahout.math.SortingTest" name="testMergeSortFloats"/>
>  <testcase time="0.328" classname="org.apache.mahout.math.SortingTest" name="testMergeSortDoubles"/>
> {code}
>
> {code:xml|title=After Patching}
>
>  <testcase time="0.031" classname="org.apache.mahout.math.SortingTest" name="testBinarySearch"/>
>  <testcase time="0" classname="org.apache.mahout.math.SortingTest" name="testBinarySearchObjects"/>
>  <testcase time="0" classname="org.apache.mahout.math.SortingTest" name="testQuickSortBytes"/>
>  <testcase time="0.234" classname="org.apache.mahout.math.SortingTest" name="testQuickSortChars"/>
>  <testcase time="0.234" classname="org.apache.mahout.math.SortingTest" name="testQuickSortInts"/>
>  <testcase time="0.172" classname="org.apache.mahout.math.SortingTest" name="testQuickSortLongs"/>
>  <testcase time="0.266" classname="org.apache.mahout.math.SortingTest" name="testQuickSortShorts"/>
>  <testcase time="0.235" classname="org.apache.mahout.math.SortingTest" name="testQuickSortFloats"/>
>  <testcase time="0.25" classname="org.apache.mahout.math.SortingTest" name="testQuickSortDoubles"/>
>  <testcase time="0.015" classname="org.apache.mahout.math.SortingTest" name="testMergeSortBytes"/>
>
>  <testcase time="0.594" classname="org.apache.mahout.math.SortingTest" name="testMergeSortChars"/>
>  <testcase time="0.61" classname="org.apache.mahout.math.SortingTest" name="testMergeSortInts"/>
>  <testcase time="0.781" classname="org.apache.mahout.math.SortingTest" name="testMergeSortLongs"/>
>  <testcase time="0.562" classname="org.apache.mahout.math.SortingTest" name="testMergeSortShorts"/>
>  <testcase time="0.578" classname="org.apache.mahout.math.SortingTest" name="testMergeSortFloats"/>
>  <testcase time="0.61" classname="org.apache.mahout.math.SortingTest" name="testMergeSortDoubles"/>
> {code}
>
>> Replace org.apache.mahout.math.Sorting with code of clear provenance
>> --------------------------------------------------------------------
>>
>>                 Key: MAHOUT-230
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-230
>>             Project: Mahout
>>          Issue Type: Bug
>>          Components: Math
>>    Affects Versions: 0.3
>>            Reporter: Benson Margulies
>>            Assignee: Benson Margulies
>>             Fix For: 0.3
>>
>>         Attachments: replace-sorting.diff
>>
>>   Original Estimate: 72h
>>  Remaining Estimate: 72h
>>
>> org.apache.mahout.math.Sorting looks as if the original author borrowed from the Sun JRE, based on the private internal function names and contents. That code has a restrictive license. We need to take the equivalent file (java.util.Arrays) from Apache Harmony and use it as the basis for a clean replacement.
>> The problematic code are the quickSort and mergeSort functions, which extend 'Arrays' by supporting slices of arrays and custom sorting predicate functions.
>> One might also wistfully note that the more recent JDKs from Sun have deployed different (and one hopes) better sort algorithms that 1.5 and/or Harmony, so a really energetic person might build implementations in here to match. However, expediency calls for just bashing on the Harmony implementation to solve the problem at hand.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>