You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Tsuyoshi Ozawa <oz...@apache.org> on 2015/02/26 09:04:05 UTC
TimSort bug and its workaround
Hi hadoop developers,
Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
is reported. How can we deal with this problem?
http://envisage-project.eu/timsort-specification-and-verification/
https://bugs.openjdk.java.net/browse/JDK-8072909
The bug causes ArrayIndexOutOfBoundsException if the number of element
is larger than 67108864.
We use the sort method at 77 places at least.
find . -name "*.java" | xargs grep "Collections.sort" | wc -l
77
One reasonable workaround is to set
java.util.Arrays.useLegacyMergeSort() by default.
Thanks,
- Tsuyoshi
Re: TimSort bug and its workaround
Posted by "Colin P. McCabe" <cm...@apache.org>.
Thanks for bringing this up. If you can find any place where an array
might realistically be larger than 67 million elements, then I guess
file a JIRA for it. Also this array needs to be of objects, not of
primitives (quicksort is used for those in jdk7, apparently). I can't
think of any such place off the top of my head, but I might be missing
something.
Potentially we could also file a JIRA just as a place to gather the
discussion, even if the only outcome is a release note.
best,
Colin
On Thu, Feb 26, 2015 at 12:16 AM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Maybe we should discuss whether the elements of array can be larger
> than 67108864 in our use cases - e.g. FairScheduler uses
> Collection.sort(), but the number of job isn't larger than 67108864 in
> many use cases, so we can keep using it. It's also reasonable that we
> choose to use safe algorithms for stability.
>
> Thanks,
> - Tsuyoshi
>
> On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
>> Hi hadoop developers,
>>
>> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>> is reported. How can we deal with this problem?
>>
>> http://envisage-project.eu/timsort-specification-and-verification/
>> https://bugs.openjdk.java.net/browse/JDK-8072909
>>
>> The bug causes ArrayIndexOutOfBoundsException if the number of element
>> is larger than 67108864.
>>
>> We use the sort method at 77 places at least.
>> find . -name "*.java" | xargs grep "Collections.sort" | wc -l
>> 77
>>
>> One reasonable workaround is to set
>> java.util.Arrays.useLegacyMergeSort() by default.
>>
>> Thanks,
>> - Tsuyoshi
Re: TimSort bug and its workaround
Posted by "Colin P. McCabe" <cm...@apache.org>.
Thanks for bringing this up. If you can find any place where an array
might realistically be larger than 67 million elements, then I guess
file a JIRA for it. Also this array needs to be of objects, not of
primitives (quicksort is used for those in jdk7, apparently). I can't
think of any such place off the top of my head, but I might be missing
something.
Potentially we could also file a JIRA just as a place to gather the
discussion, even if the only outcome is a release note.
best,
Colin
On Thu, Feb 26, 2015 at 12:16 AM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Maybe we should discuss whether the elements of array can be larger
> than 67108864 in our use cases - e.g. FairScheduler uses
> Collection.sort(), but the number of job isn't larger than 67108864 in
> many use cases, so we can keep using it. It's also reasonable that we
> choose to use safe algorithms for stability.
>
> Thanks,
> - Tsuyoshi
>
> On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
>> Hi hadoop developers,
>>
>> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>> is reported. How can we deal with this problem?
>>
>> http://envisage-project.eu/timsort-specification-and-verification/
>> https://bugs.openjdk.java.net/browse/JDK-8072909
>>
>> The bug causes ArrayIndexOutOfBoundsException if the number of element
>> is larger than 67108864.
>>
>> We use the sort method at 77 places at least.
>> find . -name "*.java" | xargs grep "Collections.sort" | wc -l
>> 77
>>
>> One reasonable workaround is to set
>> java.util.Arrays.useLegacyMergeSort() by default.
>>
>> Thanks,
>> - Tsuyoshi
Re: TimSort bug and its workaround
Posted by "Colin P. McCabe" <cm...@apache.org>.
Thanks for bringing this up. If you can find any place where an array
might realistically be larger than 67 million elements, then I guess
file a JIRA for it. Also this array needs to be of objects, not of
primitives (quicksort is used for those in jdk7, apparently). I can't
think of any such place off the top of my head, but I might be missing
something.
Potentially we could also file a JIRA just as a place to gather the
discussion, even if the only outcome is a release note.
best,
Colin
On Thu, Feb 26, 2015 at 12:16 AM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Maybe we should discuss whether the elements of array can be larger
> than 67108864 in our use cases - e.g. FairScheduler uses
> Collection.sort(), but the number of job isn't larger than 67108864 in
> many use cases, so we can keep using it. It's also reasonable that we
> choose to use safe algorithms for stability.
>
> Thanks,
> - Tsuyoshi
>
> On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
>> Hi hadoop developers,
>>
>> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>> is reported. How can we deal with this problem?
>>
>> http://envisage-project.eu/timsort-specification-and-verification/
>> https://bugs.openjdk.java.net/browse/JDK-8072909
>>
>> The bug causes ArrayIndexOutOfBoundsException if the number of element
>> is larger than 67108864.
>>
>> We use the sort method at 77 places at least.
>> find . -name "*.java" | xargs grep "Collections.sort" | wc -l
>> 77
>>
>> One reasonable workaround is to set
>> java.util.Arrays.useLegacyMergeSort() by default.
>>
>> Thanks,
>> - Tsuyoshi
Re: TimSort bug and its workaround
Posted by "Colin P. McCabe" <cm...@apache.org>.
Thanks for bringing this up. If you can find any place where an array
might realistically be larger than 67 million elements, then I guess
file a JIRA for it. Also this array needs to be of objects, not of
primitives (quicksort is used for those in jdk7, apparently). I can't
think of any such place off the top of my head, but I might be missing
something.
Potentially we could also file a JIRA just as a place to gather the
discussion, even if the only outcome is a release note.
best,
Colin
On Thu, Feb 26, 2015 at 12:16 AM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Maybe we should discuss whether the elements of array can be larger
> than 67108864 in our use cases - e.g. FairScheduler uses
> Collection.sort(), but the number of job isn't larger than 67108864 in
> many use cases, so we can keep using it. It's also reasonable that we
> choose to use safe algorithms for stability.
>
> Thanks,
> - Tsuyoshi
>
> On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
>> Hi hadoop developers,
>>
>> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
>> is reported. How can we deal with this problem?
>>
>> http://envisage-project.eu/timsort-specification-and-verification/
>> https://bugs.openjdk.java.net/browse/JDK-8072909
>>
>> The bug causes ArrayIndexOutOfBoundsException if the number of element
>> is larger than 67108864.
>>
>> We use the sort method at 77 places at least.
>> find . -name "*.java" | xargs grep "Collections.sort" | wc -l
>> 77
>>
>> One reasonable workaround is to set
>> java.util.Arrays.useLegacyMergeSort() by default.
>>
>> Thanks,
>> - Tsuyoshi
Re: TimSort bug and its workaround
Posted by Tsuyoshi Ozawa <oz...@apache.org>.
Maybe we should discuss whether the elements of array can be larger
than 67108864 in our use cases - e.g. FairScheduler uses
Collection.sort(), but the number of job isn't larger than 67108864 in
many use cases, so we can keep using it. It's also reasonable that we
choose to use safe algorithms for stability.
Thanks,
- Tsuyoshi
On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Hi hadoop developers,
>
> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
> is reported. How can we deal with this problem?
>
> http://envisage-project.eu/timsort-specification-and-verification/
> https://bugs.openjdk.java.net/browse/JDK-8072909
>
> The bug causes ArrayIndexOutOfBoundsException if the number of element
> is larger than 67108864.
>
> We use the sort method at 77 places at least.
> find . -name "*.java" | xargs grep "Collections.sort" | wc -l
> 77
>
> One reasonable workaround is to set
> java.util.Arrays.useLegacyMergeSort() by default.
>
> Thanks,
> - Tsuyoshi
Re: TimSort bug and its workaround
Posted by Tsuyoshi Ozawa <oz...@apache.org>.
Maybe we should discuss whether the elements of array can be larger
than 67108864 in our use cases - e.g. FairScheduler uses
Collection.sort(), but the number of job isn't larger than 67108864 in
many use cases, so we can keep using it. It's also reasonable that we
choose to use safe algorithms for stability.
Thanks,
- Tsuyoshi
On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Hi hadoop developers,
>
> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
> is reported. How can we deal with this problem?
>
> http://envisage-project.eu/timsort-specification-and-verification/
> https://bugs.openjdk.java.net/browse/JDK-8072909
>
> The bug causes ArrayIndexOutOfBoundsException if the number of element
> is larger than 67108864.
>
> We use the sort method at 77 places at least.
> find . -name "*.java" | xargs grep "Collections.sort" | wc -l
> 77
>
> One reasonable workaround is to set
> java.util.Arrays.useLegacyMergeSort() by default.
>
> Thanks,
> - Tsuyoshi
Re: TimSort bug and its workaround
Posted by Tsuyoshi Ozawa <oz...@apache.org>.
Maybe we should discuss whether the elements of array can be larger
than 67108864 in our use cases - e.g. FairScheduler uses
Collection.sort(), but the number of job isn't larger than 67108864 in
many use cases, so we can keep using it. It's also reasonable that we
choose to use safe algorithms for stability.
Thanks,
- Tsuyoshi
On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Hi hadoop developers,
>
> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
> is reported. How can we deal with this problem?
>
> http://envisage-project.eu/timsort-specification-and-verification/
> https://bugs.openjdk.java.net/browse/JDK-8072909
>
> The bug causes ArrayIndexOutOfBoundsException if the number of element
> is larger than 67108864.
>
> We use the sort method at 77 places at least.
> find . -name "*.java" | xargs grep "Collections.sort" | wc -l
> 77
>
> One reasonable workaround is to set
> java.util.Arrays.useLegacyMergeSort() by default.
>
> Thanks,
> - Tsuyoshi
Re: TimSort bug and its workaround
Posted by Tsuyoshi Ozawa <oz...@apache.org>.
Maybe we should discuss whether the elements of array can be larger
than 67108864 in our use cases - e.g. FairScheduler uses
Collection.sort(), but the number of job isn't larger than 67108864 in
many use cases, so we can keep using it. It's also reasonable that we
choose to use safe algorithms for stability.
Thanks,
- Tsuyoshi
On Thu, Feb 26, 2015 at 5:04 PM, Tsuyoshi Ozawa <oz...@apache.org> wrote:
> Hi hadoop developers,
>
> Last 2 weeks, a bug of JDK about TimSort, related to Collections#sort,
> is reported. How can we deal with this problem?
>
> http://envisage-project.eu/timsort-specification-and-verification/
> https://bugs.openjdk.java.net/browse/JDK-8072909
>
> The bug causes ArrayIndexOutOfBoundsException if the number of element
> is larger than 67108864.
>
> We use the sort method at 77 places at least.
> find . -name "*.java" | xargs grep "Collections.sort" | wc -l
> 77
>
> One reasonable workaround is to set
> java.util.Arrays.useLegacyMergeSort() by default.
>
> Thanks,
> - Tsuyoshi