You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Tsuyoshi Ozawa <oz...@apache.org> on 2015/02/26 09:04:05 UTC

TimSort bug and its workaround

Hi hadoop developers,

Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
 is reported. How can we deal with this problem?

http://envisage-project.eu/timsort-specification-and-verification/
https://bugs.openjdk.java.net/browse/JDK-8072909

The bug causes ArrayIndexOutOfBoundsException if the number of element
is larger than 67108864.

We use the sort method at 77 places at least.
find . -name "*.java" | xargs grep "Collections.sort"  | wc -l
77

One reasonable workaround is to set
java.util.Arrays.useLegacyMergeSort() by default.

Thanks,
- Tsuyoshi

Re: TimSort bug and its workaround

Posted by "Colin P. McCabe" <cm...@apache.org>.
Thanks for bringing this up.  If you can find any place where an array
might realistically be larger than 67 million elements, then I guess
file a JIRA for it.  Also this array needs to be of objects, not of
primitives (quicksort is used for those in jdk7, apparently).  I can't
think of any such place off the top of my head, but I might be missing
something.

Potentially we could also file a JIRA just as a place to gather the
discussion, even if the only outcome is a release note.

best,
Colin

On Thu, Feb 26, 2015 at 12:16 AM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Maybe we should discuss whether the elements of array can be larger
> than 67108864 in our use cases - e.g. FairScheduler uses
> Collection.sort(), but the number of job isn't larger than 67108864 in
> many use cases, so we can keep using it. It's also reasonable that we
> choose to use safe algorithms for stability.
>
> Thanks,
> - Tsuyoshi
>
> On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
>> Hi hadoop developers,
>>
>> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>>  is reported. How can we deal with this problem?
>>
>> http://envisage-project.eu/timsort-specification-and-verification/
>> https://bugs.openjdk.java.net/browse/JDK-8072909
>>
>> The bug causes ArrayIndexOutOfBoundsException if the number of element
>> is larger than 67108864.
>>
>> We use the sort method at 77 places at least.
>> find . -name "*.java" | xargs grep "Collections.sort"  | wc -l
>> 77
>>
>> One reasonable workaround is to set
>> java.util.Arrays.useLegacyMergeSort() by default.
>>
>> Thanks,
>> - Tsuyoshi

Re: TimSort bug and its workaround

Posted by "Colin P. McCabe" <cm...@apache.org>.
Thanks for bringing this up.  If you can find any place where an array
might realistically be larger than 67 million elements, then I guess
file a JIRA for it.  Also this array needs to be of objects, not of
primitives (quicksort is used for those in jdk7, apparently).  I can't
think of any such place off the top of my head, but I might be missing
something.

Potentially we could also file a JIRA just as a place to gather the
discussion, even if the only outcome is a release note.

best,
Colin

On Thu, Feb 26, 2015 at 12:16 AM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Maybe we should discuss whether the elements of array can be larger
> than 67108864 in our use cases - e.g. FairScheduler uses
> Collection.sort(), but the number of job isn't larger than 67108864 in
> many use cases, so we can keep using it. It's also reasonable that we
> choose to use safe algorithms for stability.
>
> Thanks,
> - Tsuyoshi
>
> On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
>> Hi hadoop developers,
>>
>> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>>  is reported. How can we deal with this problem?
>>
>> http://envisage-project.eu/timsort-specification-and-verification/
>> https://bugs.openjdk.java.net/browse/JDK-8072909
>>
>> The bug causes ArrayIndexOutOfBoundsException if the number of element
>> is larger than 67108864.
>>
>> We use the sort method at 77 places at least.
>> find . -name "*.java" | xargs grep "Collections.sort"  | wc -l
>> 77
>>
>> One reasonable workaround is to set
>> java.util.Arrays.useLegacyMergeSort() by default.
>>
>> Thanks,
>> - Tsuyoshi

Re: TimSort bug and its workaround

Posted by "Colin P. McCabe" <cm...@apache.org>.
Thanks for bringing this up.  If you can find any place where an array
might realistically be larger than 67 million elements, then I guess
file a JIRA for it.  Also this array needs to be of objects, not of
primitives (quicksort is used for those in jdk7, apparently).  I can't
think of any such place off the top of my head, but I might be missing
something.

Potentially we could also file a JIRA just as a place to gather the
discussion, even if the only outcome is a release note.

best,
Colin

On Thu, Feb 26, 2015 at 12:16 AM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Maybe we should discuss whether the elements of array can be larger
> than 67108864 in our use cases - e.g. FairScheduler uses
> Collection.sort(), but the number of job isn't larger than 67108864 in
> many use cases, so we can keep using it. It's also reasonable that we
> choose to use safe algorithms for stability.
>
> Thanks,
> - Tsuyoshi
>
> On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
>> Hi hadoop developers,
>>
>> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>>  is reported. How can we deal with this problem?
>>
>> http://envisage-project.eu/timsort-specification-and-verification/
>> https://bugs.openjdk.java.net/browse/JDK-8072909
>>
>> The bug causes ArrayIndexOutOfBoundsException if the number of element
>> is larger than 67108864.
>>
>> We use the sort method at 77 places at least.
>> find . -name "*.java" | xargs grep "Collections.sort"  | wc -l
>> 77
>>
>> One reasonable workaround is to set
>> java.util.Arrays.useLegacyMergeSort() by default.
>>
>> Thanks,
>> - Tsuyoshi

Re: TimSort bug and its workaround

Posted by "Colin P. McCabe" <cm...@apache.org>.
Thanks for bringing this up.  If you can find any place where an array
might realistically be larger than 67 million elements, then I guess
file a JIRA for it.  Also this array needs to be of objects, not of
primitives (quicksort is used for those in jdk7, apparently).  I can't
think of any such place off the top of my head, but I might be missing
something.

Potentially we could also file a JIRA just as a place to gather the
discussion, even if the only outcome is a release note.

best,
Colin

On Thu, Feb 26, 2015 at 12:16 AM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Maybe we should discuss whether the elements of array can be larger
> than 67108864 in our use cases - e.g. FairScheduler uses
> Collection.sort(), but the number of job isn't larger than 67108864 in
> many use cases, so we can keep using it. It's also reasonable that we
> choose to use safe algorithms for stability.
>
> Thanks,
> - Tsuyoshi
>
> On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
>> Hi hadoop developers,
>>
>> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>>  is reported. How can we deal with this problem?
>>
>> http://envisage-project.eu/timsort-specification-and-verification/
>> https://bugs.openjdk.java.net/browse/JDK-8072909
>>
>> The bug causes ArrayIndexOutOfBoundsException if the number of element
>> is larger than 67108864.
>>
>> We use the sort method at 77 places at least.
>> find . -name "*.java" | xargs grep "Collections.sort"  | wc -l
>> 77
>>
>> One reasonable workaround is to set
>> java.util.Arrays.useLegacyMergeSort() by default.
>>
>> Thanks,
>> - Tsuyoshi

Re: TimSort bug and its workaround

Posted by Tsuyoshi Ozawa <oz...@apache.org>.
Maybe we should discuss whether the elements of array can be larger
than 67108864 in our use cases - e.g. FairScheduler uses
Collection.sort(), but the number of job isn't larger than 67108864 in
many use cases, so we can keep using it. It's also reasonable that we
choose to use safe algorithms for stability.

Thanks,
- Tsuyoshi

On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Hi hadoop developers,
>
> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>  is reported. How can we deal with this problem?
>
> http://envisage-project.eu/timsort-specification-and-verification/
> https://bugs.openjdk.java.net/browse/JDK-8072909
>
> The bug causes ArrayIndexOutOfBoundsException if the number of element
> is larger than 67108864.
>
> We use the sort method at 77 places at least.
> find . -name "*.java" | xargs grep "Collections.sort"  | wc -l
> 77
>
> One reasonable workaround is to set
> java.util.Arrays.useLegacyMergeSort() by default.
>
> Thanks,
> - Tsuyoshi

Re: TimSort bug and its workaround

Posted by Tsuyoshi Ozawa <oz...@apache.org>.
Maybe we should discuss whether the elements of array can be larger
than 67108864 in our use cases - e.g. FairScheduler uses
Collection.sort(), but the number of job isn't larger than 67108864 in
many use cases, so we can keep using it. It's also reasonable that we
choose to use safe algorithms for stability.

Thanks,
- Tsuyoshi

On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Hi hadoop developers,
>
> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>  is reported. How can we deal with this problem?
>
> http://envisage-project.eu/timsort-specification-and-verification/
> https://bugs.openjdk.java.net/browse/JDK-8072909
>
> The bug causes ArrayIndexOutOfBoundsException if the number of element
> is larger than 67108864.
>
> We use the sort method at 77 places at least.
> find . -name "*.java" | xargs grep "Collections.sort"  | wc -l
> 77
>
> One reasonable workaround is to set
> java.util.Arrays.useLegacyMergeSort() by default.
>
> Thanks,
> - Tsuyoshi

Re: TimSort bug and its workaround

Posted by Tsuyoshi Ozawa <oz...@apache.org>.
Maybe we should discuss whether the elements of array can be larger
than 67108864 in our use cases - e.g. FairScheduler uses
Collection.sort(), but the number of job isn't larger than 67108864 in
many use cases, so we can keep using it. It's also reasonable that we
choose to use safe algorithms for stability.

Thanks,
- Tsuyoshi

On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Hi hadoop developers,
>
> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>  is reported. How can we deal with this problem?
>
> http://envisage-project.eu/timsort-specification-and-verification/
> https://bugs.openjdk.java.net/browse/JDK-8072909
>
> The bug causes ArrayIndexOutOfBoundsException if the number of element
> is larger than 67108864.
>
> We use the sort method at 77 places at least.
> find . -name "*.java" | xargs grep "Collections.sort"  | wc -l
> 77
>
> One reasonable workaround is to set
> java.util.Arrays.useLegacyMergeSort() by default.
>
> Thanks,
> - Tsuyoshi

Re: TimSort bug and its workaround

Posted by Tsuyoshi Ozawa <oz...@apache.org>.
Maybe we should discuss whether the elements of array can be larger
than 67108864 in our use cases - e.g. FairScheduler uses
Collection.sort(), but the number of job isn't larger than 67108864 in
many use cases, so we can keep using it. It's also reasonable that we
choose to use safe algorithms for stability.

Thanks,
- Tsuyoshi

On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Hi hadoop developers,
>
> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>  is reported. How can we deal with this problem?
>
> http://envisage-project.eu/timsort-specification-and-verification/
> https://bugs.openjdk.java.net/browse/JDK-8072909
>
> The bug causes ArrayIndexOutOfBoundsException if the number of element
> is larger than 67108864.
>
> We use the sort method at 77 places at least.
> find . -name "*.java" | xargs grep "Collections.sort"  | wc -l
> 77
>
> One reasonable workaround is to set
> java.util.Arrays.useLegacyMergeSort() by default.
>
> Thanks,
> - Tsuyoshi