You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Yash Sharma <ya...@gmail.com> on 2015/02/26 10:00:53 UTC

OpenJDK’s java.utils.Collection.sort() Bug

As pointed out on the Hadoop mailing list -

The OpenJDK’s java.utils.Collection.sort() is broken - such that the
default TimSort implementation would cause ArrayIndexOutOfBoundsException
for number of elements larger than 67108864.

I wonder if we can have such a huge collection in Drill and might hit this
bug ?
We do have Collections.sort used in multiple places
including DrillTextRecordReader but do we need to consider workaround for
this ?

Thoughts ?

Links:
http://envisage-project.eu/timsort-specification-and-verification/

https://bugs.openjdk.java.net/browse/JDK-8072909

Re: OpenJDK’s java.utils.Collection.sort() Bug

Posted by Yash Sharma <ya...@gmail.com>.
Makes sense.
We just need to keep in mind that we don't use collection.sort for sorting
actual data. Otherwise we should never hit this bug.

On Thu, Feb 26, 2015 at 4:28 PM, Steven Phillips <sp...@maprtech.com>
wrote:

> It looks like we are using the method in 5 different places in drill. We
> are using to sort lists of: files, drillbit endpoints, workunits, operator
> profiles, and columnIds.
>
> I can't imagine we are ever going to need to sort millions of those. So
> probably no need to worry about this bug.
>
> But we should keep it in mind for any future code that might want to use
> it.
>
> On Thu, Feb 26, 2015 at 1:00 AM, Yash Sharma <ya...@gmail.com> wrote:
>
> > As pointed out on the Hadoop mailing list -
> >
> > The OpenJDK’s java.utils.Collection.sort() is broken - such that the
> > default TimSort implementation would cause ArrayIndexOutOfBoundsException
> > for number of elements larger than 67108864.
> >
> > I wonder if we can have such a huge collection in Drill and might hit
> this
> > bug ?
> > We do have Collections.sort used in multiple places
> > including DrillTextRecordReader but do we need to consider workaround for
> > this ?
> >
> > Thoughts ?
> >
> > Links:
> > http://envisage-project.eu/timsort-specification-and-verification/
> >
> > https://bugs.openjdk.java.net/browse/JDK-8072909
> >
>
>
>
> --
>  Steven Phillips
>  Software Engineer
>
>  mapr.com
>

Re: OpenJDK’s java.utils.Collection.sort() Bug

Posted by Steven Phillips <sp...@maprtech.com>.
It looks like we are using the method in 5 different places in drill. We
are using to sort lists of: files, drillbit endpoints, workunits, operator
profiles, and columnIds.

I can't imagine we are ever going to need to sort millions of those. So
probably no need to worry about this bug.

But we should keep it in mind for any future code that might want to use it.

On Thu, Feb 26, 2015 at 1:00 AM, Yash Sharma <ya...@gmail.com> wrote:

> As pointed out on the Hadoop mailing list -
>
> The OpenJDK’s java.utils.Collection.sort() is broken - such that the
> default TimSort implementation would cause ArrayIndexOutOfBoundsException
> for number of elements larger than 67108864.
>
> I wonder if we can have such a huge collection in Drill and might hit this
> bug ?
> We do have Collections.sort used in multiple places
> including DrillTextRecordReader but do we need to consider workaround for
> this ?
>
> Thoughts ?
>
> Links:
> http://envisage-project.eu/timsort-specification-and-verification/
>
> https://bugs.openjdk.java.net/browse/JDK-8072909
>



-- 
 Steven Phillips
 Software Engineer

 mapr.com