You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by David <da...@gmail.com> on 2021/09/01 14:51:30 UTC

Re: ORC 2.0

Hello Dongjoon,

> Without any evidence, there is no point of dropping at this point.
Fair enough.  I'll see what I can do.

>So, what is your stance about Owen's PR?
Amazing work.  I just think it's too big for any one person to review.  We
should take an incremental approach to this, class by class,... deprecating
any functionality which uses the Hadoop stuff, and then remove it all in V2.

Thanks.

On Tue, Aug 31, 2021 at 11:44 AM Dongjoon Hyun <do...@gmail.com>
wrote:

> Could you share the JDK11-only improvement result first?
> Without any evidence, there is no point of dropping at this point.
>
> > I suspect that there are new APIs within JDK 11 that will
> enhance performance of the ORC.
>
> So, what is your stance about Owen's PR?
>
> > I am then proposing that we begin the process of iteratively removing the
> hadoop dependencies.
>
> Dongjoon.
>
> On Tue, Aug 31, 2021 at 6:12 AM David <da...@gmail.com> wrote:
>
> > Hello,
> >
> > Thank you for your interest.
> >
> > I am proposing tagging the 1.x line and reserving it for JDK 8
> > Moving the 'main' branch to be built on a minimum of JDK11
> >
> > Note that the Premier Support for JDK8 expires in March 2022.
> >
> > https://www.oracle.com/java/technologies/java-se-support-roadmap.html
> >
> > I suspect that there are new APIs within JDK 11 that will enhance
> > performance of the ORC.  In particular I see that there are a bunch of
> > improvements around comparing byte arrays (which ORC does quite a bit
> of).
> >
> >
> >
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/util/Arrays.java#L2700
> >
> >
> https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/jdk/internal/util/ArraysSupport.java#L228
> >
> > I am then proposing that we begin the process of iteratively removing the
> > hadoop dependencies.  At a minimum ORC 2.0 is released once that work is
> > completed.
> >
> > Thanks.
> >
> > On Mon, Aug 30, 2021 at 3:45 PM Dongjoon Hyun <do...@gmail.com>
> > wrote:
> >
> > > Thank you for sending an email.
> > > Could you elaborate more about your background?
> > >
> > > The following is my opinion at first glance.
> > >
> > > For (1), Apache ORC supports Java 8/11/17
> > > without any problem as you see our CI test coverage.
> > > I'm -1 for dropping Java 8 support because
> > > We still have lots of customers who are on JDK8 still.
> > > Specifically, Apache Spark distribution should be built with JDK8.
> > >
> > > For (2), there is Owen's PR in the community.
> > >
> > >     https://github.com/apache/orc/pull/641
> > >     ORC-508 remove hadoop dependency
> > >
> > > So, I'm wondering if you are
> > >     A. Proposing a new PR, or
> > >     B. Taking over Owen's PR
> > >
> > > Thanks,
> > > Dongjoon.
> > >
> > > On Mon, Aug 30, 2021 at 6:11 AM David <da...@gmail.com> wrote:
> > >
> > > > Hello Gang,
> > > >
> > > > Thank you for being very accommodating and welcoming to my sometimes
> > > > tedious pull requests.
> > > >
> > > > I'm not sure of the capacity of the participants of the project, but
> I
> > > > would like to propose starting on ORC v2 with the following
> objectives:
> > > >
> > > > 1. Moving to JDK 11 (LTS)
> > > > 2. Removing the direct dependencies on Hadoop of core ORC (and
> > scrubbing
> > > > many of the mentions to "Hadoop" from the website).
> > > >
> > > > Thanks,
> > > > David (Belugabehr)
> > > >
> > >
> >
>