You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Colin P. McCabe" <cm...@apache.org> on 2015/03/02 19:19:18 UTC

Re: TimSort bug and its workaround

Thanks for bringing this up.  If you can find any place where an array
might realistically be larger than 67 million elements, then I guess
file a JIRA for it.  Also this array needs to be of objects, not of
primitives (quicksort is used for those in jdk7, apparently).  I can't
think of any such place off the top of my head, but I might be missing
something.

Potentially we could also file a JIRA just as a place to gather the
discussion, even if the only outcome is a release note.

best,
Colin

On Thu, Feb 26, 2015 at 12:16 AM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Maybe we should discuss whether the elements of array can be larger
> than 67108864 in our use cases - e.g. FairScheduler uses
> Collection.sort(), but the number of job isn't larger than 67108864 in
> many use cases, so we can keep using it. It's also reasonable that we
> choose to use safe algorithms for stability.
>
> Thanks,
> - Tsuyoshi
>
> On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
>> Hi hadoop developers,
>>
>> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>>  is reported. How can we deal with this problem?
>>
>> http://envisage-project.eu/timsort-specification-and-verification/
>> https://bugs.openjdk.java.net/browse/JDK-8072909
>>
>> The bug causes ArrayIndexOutOfBoundsException if the number of element
>> is larger than 67108864.
>>
>> We use the sort method at 77 places at least.
>> find . -name "*.java" | xargs grep "Collections.sort"  | wc -l
>> 77
>>
>> One reasonable workaround is to set
>> java.util.Arrays.useLegacyMergeSort() by default.
>>
>> Thanks,
>> - Tsuyoshi