You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Tim Armstrong <ta...@cloudera.com> on 2020/02/12 23:45:36 UTC

Re: Impala 4.x (and 3.x)

Do we plan to switch the default catalog implementation to the local
catalog as well?

On Wed, Jan 29, 2020 at 3:41 PM Tim Armstrong <ta...@cloudera.com>
wrote:

> We haven't generally had clear guarantees about what OS versions we're
> going to support. I ran into issues with
> https://issues.apache.org/jira/browse/IMPALA-8508 where we couldn't run
> python 3 on centos 6.4, for example.
>
> Would it be possible to explicitly state what OS versions we're not going
> to support in Impala 4.0?
>
> On Mon, Jan 27, 2020 at 12:05 PM Joe McDonnell <jo...@cloudera.com>
> wrote:
>
>> Given the positive feedback, I will begin moving forward on the proposed
>> plan. I will start a new thread to discuss the Impala 3.4 release, and we
>> can start collecting/discussing the breaking changes that we want for
>> Impala 4.0.
>>
>> Feedback about the plan is still welcome on this thread. Alternatively,
>> any
>> concerns can be raised in a new thread here on dev@.
>>
>> Thanks,
>> Joe
>>
>> On Wed, Jan 22, 2020 at 9:06 AM Laszlo Gaal <la...@cloudera.com>
>> wrote:
>>
>> > +1
>> >
>> > I'd also add that bumping the major version of Impala opens the window
>> > for introducing breaking changes.
>> >
>> > I don't intend to hijack this mail thread for that purpose,
>> > but I'd like to suggest compiling such a list in the context of the
>> version
>> > bump.
>> >
>> > Thanks,
>> >
>> >     - LaszloG
>> >
>> > On Tue, Jan 21, 2020 at 6:48 PM Anurag Mantripragada <
>> anurag@cloudera.com>
>> > wrote:
>> >
>> > > This makes sense.
>> > > +1
>> > >
>> > > On Tue, Jan 21, 2020 at 9:03 AM Andrew Sherman <asherman@cloudera.com
>> >
>> > > wrote:
>> > >
>> > > > +1
>> > > >
>> > > >
>> > > > On Tue, Jan 21, 2020 at 8:28 AM Sahil Takiar <
>> takiar.sahil@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > +1 makes sense to me.
>> > > > >
>> > > > > On Mon, Jan 20, 2020 at 4:55 PM Tim Armstrong <
>> > tarmstrong@cloudera.com
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > I think this proposal make sense - we've done well in enabling
>> > > parallel
>> > > > > > development for different Hive versions so far, but it is a
>> burden.
>> > > > E.g.
>> > > > > we
>> > > > > > still don't have precommit tests for Hive 3+ (I like that name)
>> > and I
>> > > > > don't
>> > > > > > know that we want to go about making the suite of precommit
>> tests
>> > > even
>> > > > > > larger.
>> > > > > >
>> > > > > > On Fri, Jan 17, 2020 at 4:29 PM Joe McDonnell <
>> > > > joemcdonnell@cloudera.com
>> > > > > >
>> > > > > > wrote:
>> > > > > >
>> > > > > > > I wanted to start a conversation around moving to develop
>> against
>> > > > Hive
>> > > > > 3+
>> > > > > > > by default. (I describe this as Hive 3+ because it is close to
>> > Hive
>> > > > > > master,
>> > > > > > > which is well beyond any released Hive 3.) There has been
>> > > > considerable
>> > > > > > > development effort towards implementing features integrating
>> > Impala
>> > > > > with
>> > > > > > > Hive 3+ and Hive ACID. This is currently developed under the
>> > > > > > > USE_CDP_HIVE=true configuration while regular development has
>> > > > continued
>> > > > > > > with Hive 2. The Hive 3+ development is now stable enough to
>> be
>> > > used
>> > > > > for
>> > > > > > > regular development. It would be nice to reduce our test and
>> > > > > > compatibility
>> > > > > > > matrix and have a unified development environment.
>> > > > > > >
>> > > > > > > Changing the major version of Hive is a breaking change, so it
>> > > would
>> > > > > > > require an Impala 4.x code line. I have a specific proposal,
>> but
>> > > this
>> > > > > is
>> > > > > > > mainly a frame for getting the discussion going.
>> > > > > > >
>> > > > > > > I propose that we release Impala 3.4.0 and then update master
>> to
>> > > 4.0
>> > > > > and
>> > > > > > > allow breaking changes until the Impala 4.0 release. The main
>> > > > breaking
>> > > > > > > change would be to set USE_CDP_HIVE=true, enabling Hive 3+
>> > > > development
>> > > > > by
>> > > > > > > default. The Hive 2 configuration would be removed over time.
>> > Other
>> > > > > > > breaking changes can be proposed and voted on.
>> > > > > > >
>> > > > > > > If there are developers interested in maintaining a 3.x
>> branch,
>> > we
>> > > > can
>> > > > > > > create this branch and add appropriate support to any
>> > > infrastructure
>> > > > > > (e.g.
>> > > > > > > bin/push_to_asf.py) to allow that.
>> > > > > > >
>> > > > > > > Thoughts?
>> > > > > > >
>> > > > > > > Thanks,
>> > > > > > >
>> > > > > > > Joe McDonnell
>> > > > > > >
>> > > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > Sahil Takiar
>> > > > > Software Engineer
>> > > > > takiar.sahil@gmail.com | (510) 673-0309
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Impala 4.x (and 3.x)

Posted by Tim Armstrong <ta...@cloudera.com>.
I'd be ok with saying that we'll make supporting CentOS 7 and Ubuntu
16.04/18.04 explicit goals for Impala 4.0. And we'll explicitly bless
breaking compat with CentOS 6 and Ubuntu 14.04.

It looks like SLES 12 is still in support for years:
https://en.wikipedia.org/wiki/SUSE_Linux_Enterprise. But I don't think we
have anyone in the community regularly testing it (if there is, they should
pipe up). So maybe we can say that will will support it in a best-effort
way, i.e. accept patches to fix breakages and generally try to avoid
unnecessary breakage.

On Fri, Feb 28, 2020 at 1:38 PM Laszlo Gaal <la...@cloudera.com>
wrote:

> Tim asked:
>
> We haven't generally had clear guarantees about what OS versions we're
> > going to support. I ran into issues with
> > https://issues.apache.org/jira/browse/IMPALA-8508 where we couldn't run
> > python 3 on centos 6.4, for example.
> > Would it be possible to explicitly state what OS versions we're not going
> > to support in Impala 4.0?
>
>
> +1
> We tried to set clear expectations for Impala 3.0 when it started, see
> IMPALA-7273, IMPALA-7274, and IMPALA-6826 (for adding Ubuntu 18.04)
>
> I agree with Tim's proposal that the project should clearly state which
> operating system
> versions it supports.
>
> I propose that the following OS platforms should be retired for Impala 4.0:
> - CentOS 6: according to https://wiki.centos.org/About/Product maintenance
> updates
>   will stop at the end of this November, and it also has an ancient kernel
> version.
>   CentOS 7 has been available for almost 6 years now, so it is quite
> mature.
>   Additionally, dropping CentOS 6 would also let us drop the Python 2.6
> compatibility
>   requirement for our Python scripts.
> - Ubuntu 14.04: https://ubuntu.com/about/release-cycle shows that it has
> been in
>   Extended Security Maintenance (the last phase of the lifecycle) for
> almost a year now.
>   Ubuntu 16.04 and 18.04 have been available for years, and 20.04 is
> expected later this spring.
>
> I wonder what the community's opinion is about SLES 12 (mentioned in
> IMPALA-7273).
>
> We could also express our intention to support CentOS 8 and Ubuntu 20.04
> (when it becomes
> available), but these are just additions, not breaking changes.
>
> On Thu, Feb 13, 2020 at 1:58 AM Tim Armstrong <ta...@cloudera.com>
> wrote:
>
> > Sounds good to me - that approach has been successful before - i.e.
> > switching the default in the major version, deprecating the old one, and
> > then removing it some number of releases later. Although in some cases we
> > have been slow to remove the code, e.g. the old decimal behaviour is
> still
> > accessible.
> >
> > On Wed, Feb 12, 2020 at 4:37 PM Vihang Karajgaonkar <vihang@cloudera.com
> >
> > wrote:
> >
> > > Yes, I think changing to local catalog can be done in 4.0 branch. Based
> > on
> > > the feedback received here I think we can start removing the code for
> 3.x
> > > catalog implementation after we fill in some of the gaps in local
> > catalog.
> > > We can selectively decide which ones are (eg. external data sources,
> HDFS
> > > caching) important and need to be supported in local catalog mode.
> Having
> > > just one mode of catalog will simplify implementation of newer features
> > > like catalog HA, fine-grained partition level metadata a lot.
> > >
> > > On Wed, Feb 12, 2020 at 3:46 PM Tim Armstrong <tarmstrong@cloudera.com
> >
> > > wrote:
> > >
> > > > Do we plan to switch the default catalog implementation to the local
> > > > catalog as well?
> > > >
> > > > On Wed, Jan 29, 2020 at 3:41 PM Tim Armstrong <
> tarmstrong@cloudera.com
> > >
> > > > wrote:
> > > >
> > > > > We haven't generally had clear guarantees about what OS versions
> > we're
> > > > > going to support. I ran into issues with
> > > > > https://issues.apache.org/jira/browse/IMPALA-8508 where we
> couldn't
> > > run
> > > > > python 3 on centos 6.4, for example.
> > > > >
> > > > > Would it be possible to explicitly state what OS versions we're not
> > > going
> > > > > to support in Impala 4.0?
> > > > >
> > > > > On Mon, Jan 27, 2020 at 12:05 PM Joe McDonnell <
> > > > joemcdonnell@cloudera.com>
> > > > > wrote:
> > > > >
> > > > >> Given the positive feedback, I will begin moving forward on the
> > > proposed
> > > > >> plan. I will start a new thread to discuss the Impala 3.4 release,
> > and
> > > > we
> > > > >> can start collecting/discussing the breaking changes that we want
> > for
> > > > >> Impala 4.0.
> > > > >>
> > > > >> Feedback about the plan is still welcome on this thread.
> > > Alternatively,
> > > > >> any
> > > > >> concerns can be raised in a new thread here on dev@.
> > > > >>
> > > > >> Thanks,
> > > > >> Joe
> > > > >>
> > > > >> On Wed, Jan 22, 2020 at 9:06 AM Laszlo Gaal <
> > laszlo.gaal@cloudera.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> > +1
> > > > >> >
> > > > >> > I'd also add that bumping the major version of Impala opens the
> > > window
> > > > >> > for introducing breaking changes.
> > > > >> >
> > > > >> > I don't intend to hijack this mail thread for that purpose,
> > > > >> > but I'd like to suggest compiling such a list in the context of
> > the
> > > > >> version
> > > > >> > bump.
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> >     - LaszloG
> > > > >> >
> > > > >> > On Tue, Jan 21, 2020 at 6:48 PM Anurag Mantripragada <
> > > > >> anurag@cloudera.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > This makes sense.
> > > > >> > > +1
> > > > >> > >
> > > > >> > > On Tue, Jan 21, 2020 at 9:03 AM Andrew Sherman <
> > > > asherman@cloudera.com
> > > > >> >
> > > > >> > > wrote:
> > > > >> > >
> > > > >> > > > +1
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > On Tue, Jan 21, 2020 at 8:28 AM Sahil Takiar <
> > > > >> takiar.sahil@gmail.com>
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > +1 makes sense to me.
> > > > >> > > > >
> > > > >> > > > > On Mon, Jan 20, 2020 at 4:55 PM Tim Armstrong <
> > > > >> > tarmstrong@cloudera.com
> > > > >> > > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > I think this proposal make sense - we've done well in
> > > enabling
> > > > >> > > parallel
> > > > >> > > > > > development for different Hive versions so far, but it
> is
> > a
> > > > >> burden.
> > > > >> > > > E.g.
> > > > >> > > > > we
> > > > >> > > > > > still don't have precommit tests for Hive 3+ (I like
> that
> > > > name)
> > > > >> > and I
> > > > >> > > > > don't
> > > > >> > > > > > know that we want to go about making the suite of
> > precommit
> > > > >> tests
> > > > >> > > even
> > > > >> > > > > > larger.
> > > > >> > > > > >
> > > > >> > > > > > On Fri, Jan 17, 2020 at 4:29 PM Joe McDonnell <
> > > > >> > > > joemcdonnell@cloudera.com
> > > > >> > > > > >
> > > > >> > > > > > wrote:
> > > > >> > > > > >
> > > > >> > > > > > > I wanted to start a conversation around moving to
> > develop
> > > > >> against
> > > > >> > > > Hive
> > > > >> > > > > 3+
> > > > >> > > > > > > by default. (I describe this as Hive 3+ because it is
> > > close
> > > > to
> > > > >> > Hive
> > > > >> > > > > > master,
> > > > >> > > > > > > which is well beyond any released Hive 3.) There has
> > been
> > > > >> > > > considerable
> > > > >> > > > > > > development effort towards implementing features
> > > integrating
> > > > >> > Impala
> > > > >> > > > > with
> > > > >> > > > > > > Hive 3+ and Hive ACID. This is currently developed
> under
> > > the
> > > > >> > > > > > > USE_CDP_HIVE=true configuration while regular
> > development
> > > > has
> > > > >> > > > continued
> > > > >> > > > > > > with Hive 2. The Hive 3+ development is now stable
> > enough
> > > to
> > > > >> be
> > > > >> > > used
> > > > >> > > > > for
> > > > >> > > > > > > regular development. It would be nice to reduce our
> test
> > > and
> > > > >> > > > > > compatibility
> > > > >> > > > > > > matrix and have a unified development environment.
> > > > >> > > > > > >
> > > > >> > > > > > > Changing the major version of Hive is a breaking
> change,
> > > so
> > > > it
> > > > >> > > would
> > > > >> > > > > > > require an Impala 4.x code line. I have a specific
> > > proposal,
> > > > >> but
> > > > >> > > this
> > > > >> > > > > is
> > > > >> > > > > > > mainly a frame for getting the discussion going.
> > > > >> > > > > > >
> > > > >> > > > > > > I propose that we release Impala 3.4.0 and then update
> > > > master
> > > > >> to
> > > > >> > > 4.0
> > > > >> > > > > and
> > > > >> > > > > > > allow breaking changes until the Impala 4.0 release.
> The
> > > > main
> > > > >> > > > breaking
> > > > >> > > > > > > change would be to set USE_CDP_HIVE=true, enabling
> Hive
> > 3+
> > > > >> > > > development
> > > > >> > > > > by
> > > > >> > > > > > > default. The Hive 2 configuration would be removed
> over
> > > > time.
> > > > >> > Other
> > > > >> > > > > > > breaking changes can be proposed and voted on.
> > > > >> > > > > > >
> > > > >> > > > > > > If there are developers interested in maintaining a
> 3.x
> > > > >> branch,
> > > > >> > we
> > > > >> > > > can
> > > > >> > > > > > > create this branch and add appropriate support to any
> > > > >> > > infrastructure
> > > > >> > > > > > (e.g.
> > > > >> > > > > > > bin/push_to_asf.py) to allow that.
> > > > >> > > > > > >
> > > > >> > > > > > > Thoughts?
> > > > >> > > > > > >
> > > > >> > > > > > > Thanks,
> > > > >> > > > > > >
> > > > >> > > > > > > Joe McDonnell
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > > >
> > > > >> > > > > --
> > > > >> > > > > Sahil Takiar
> > > > >> > > > > Software Engineer
> > > > >> > > > > takiar.sahil@gmail.com | (510) 673-0309
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Impala 4.x (and 3.x)

Posted by Laszlo Gaal <la...@cloudera.com>.
Tim asked:

We haven't generally had clear guarantees about what OS versions we're
> going to support. I ran into issues with
> https://issues.apache.org/jira/browse/IMPALA-8508 where we couldn't run
> python 3 on centos 6.4, for example.
> Would it be possible to explicitly state what OS versions we're not going
> to support in Impala 4.0?


+1
We tried to set clear expectations for Impala 3.0 when it started, see
IMPALA-7273, IMPALA-7274, and IMPALA-6826 (for adding Ubuntu 18.04)

I agree with Tim's proposal that the project should clearly state which
operating system
versions it supports.

I propose that the following OS platforms should be retired for Impala 4.0:
- CentOS 6: according to https://wiki.centos.org/About/Product maintenance
updates
  will stop at the end of this November, and it also has an ancient kernel
version.
  CentOS 7 has been available for almost 6 years now, so it is quite mature.
  Additionally, dropping CentOS 6 would also let us drop the Python 2.6
compatibility
  requirement for our Python scripts.
- Ubuntu 14.04: https://ubuntu.com/about/release-cycle shows that it has
been in
  Extended Security Maintenance (the last phase of the lifecycle) for
almost a year now.
  Ubuntu 16.04 and 18.04 have been available for years, and 20.04 is
expected later this spring.

I wonder what the community's opinion is about SLES 12 (mentioned in
IMPALA-7273).

We could also express our intention to support CentOS 8 and Ubuntu 20.04
(when it becomes
available), but these are just additions, not breaking changes.

On Thu, Feb 13, 2020 at 1:58 AM Tim Armstrong <ta...@cloudera.com>
wrote:

> Sounds good to me - that approach has been successful before - i.e.
> switching the default in the major version, deprecating the old one, and
> then removing it some number of releases later. Although in some cases we
> have been slow to remove the code, e.g. the old decimal behaviour is still
> accessible.
>
> On Wed, Feb 12, 2020 at 4:37 PM Vihang Karajgaonkar <vi...@cloudera.com>
> wrote:
>
> > Yes, I think changing to local catalog can be done in 4.0 branch. Based
> on
> > the feedback received here I think we can start removing the code for 3.x
> > catalog implementation after we fill in some of the gaps in local
> catalog.
> > We can selectively decide which ones are (eg. external data sources, HDFS
> > caching) important and need to be supported in local catalog mode. Having
> > just one mode of catalog will simplify implementation of newer features
> > like catalog HA, fine-grained partition level metadata a lot.
> >
> > On Wed, Feb 12, 2020 at 3:46 PM Tim Armstrong <ta...@cloudera.com>
> > wrote:
> >
> > > Do we plan to switch the default catalog implementation to the local
> > > catalog as well?
> > >
> > > On Wed, Jan 29, 2020 at 3:41 PM Tim Armstrong <tarmstrong@cloudera.com
> >
> > > wrote:
> > >
> > > > We haven't generally had clear guarantees about what OS versions
> we're
> > > > going to support. I ran into issues with
> > > > https://issues.apache.org/jira/browse/IMPALA-8508 where we couldn't
> > run
> > > > python 3 on centos 6.4, for example.
> > > >
> > > > Would it be possible to explicitly state what OS versions we're not
> > going
> > > > to support in Impala 4.0?
> > > >
> > > > On Mon, Jan 27, 2020 at 12:05 PM Joe McDonnell <
> > > joemcdonnell@cloudera.com>
> > > > wrote:
> > > >
> > > >> Given the positive feedback, I will begin moving forward on the
> > proposed
> > > >> plan. I will start a new thread to discuss the Impala 3.4 release,
> and
> > > we
> > > >> can start collecting/discussing the breaking changes that we want
> for
> > > >> Impala 4.0.
> > > >>
> > > >> Feedback about the plan is still welcome on this thread.
> > Alternatively,
> > > >> any
> > > >> concerns can be raised in a new thread here on dev@.
> > > >>
> > > >> Thanks,
> > > >> Joe
> > > >>
> > > >> On Wed, Jan 22, 2020 at 9:06 AM Laszlo Gaal <
> laszlo.gaal@cloudera.com
> > >
> > > >> wrote:
> > > >>
> > > >> > +1
> > > >> >
> > > >> > I'd also add that bumping the major version of Impala opens the
> > window
> > > >> > for introducing breaking changes.
> > > >> >
> > > >> > I don't intend to hijack this mail thread for that purpose,
> > > >> > but I'd like to suggest compiling such a list in the context of
> the
> > > >> version
> > > >> > bump.
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> >     - LaszloG
> > > >> >
> > > >> > On Tue, Jan 21, 2020 at 6:48 PM Anurag Mantripragada <
> > > >> anurag@cloudera.com>
> > > >> > wrote:
> > > >> >
> > > >> > > This makes sense.
> > > >> > > +1
> > > >> > >
> > > >> > > On Tue, Jan 21, 2020 at 9:03 AM Andrew Sherman <
> > > asherman@cloudera.com
> > > >> >
> > > >> > > wrote:
> > > >> > >
> > > >> > > > +1
> > > >> > > >
> > > >> > > >
> > > >> > > > On Tue, Jan 21, 2020 at 8:28 AM Sahil Takiar <
> > > >> takiar.sahil@gmail.com>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > +1 makes sense to me.
> > > >> > > > >
> > > >> > > > > On Mon, Jan 20, 2020 at 4:55 PM Tim Armstrong <
> > > >> > tarmstrong@cloudera.com
> > > >> > > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > I think this proposal make sense - we've done well in
> > enabling
> > > >> > > parallel
> > > >> > > > > > development for different Hive versions so far, but it is
> a
> > > >> burden.
> > > >> > > > E.g.
> > > >> > > > > we
> > > >> > > > > > still don't have precommit tests for Hive 3+ (I like that
> > > name)
> > > >> > and I
> > > >> > > > > don't
> > > >> > > > > > know that we want to go about making the suite of
> precommit
> > > >> tests
> > > >> > > even
> > > >> > > > > > larger.
> > > >> > > > > >
> > > >> > > > > > On Fri, Jan 17, 2020 at 4:29 PM Joe McDonnell <
> > > >> > > > joemcdonnell@cloudera.com
> > > >> > > > > >
> > > >> > > > > > wrote:
> > > >> > > > > >
> > > >> > > > > > > I wanted to start a conversation around moving to
> develop
> > > >> against
> > > >> > > > Hive
> > > >> > > > > 3+
> > > >> > > > > > > by default. (I describe this as Hive 3+ because it is
> > close
> > > to
> > > >> > Hive
> > > >> > > > > > master,
> > > >> > > > > > > which is well beyond any released Hive 3.) There has
> been
> > > >> > > > considerable
> > > >> > > > > > > development effort towards implementing features
> > integrating
> > > >> > Impala
> > > >> > > > > with
> > > >> > > > > > > Hive 3+ and Hive ACID. This is currently developed under
> > the
> > > >> > > > > > > USE_CDP_HIVE=true configuration while regular
> development
> > > has
> > > >> > > > continued
> > > >> > > > > > > with Hive 2. The Hive 3+ development is now stable
> enough
> > to
> > > >> be
> > > >> > > used
> > > >> > > > > for
> > > >> > > > > > > regular development. It would be nice to reduce our test
> > and
> > > >> > > > > > compatibility
> > > >> > > > > > > matrix and have a unified development environment.
> > > >> > > > > > >
> > > >> > > > > > > Changing the major version of Hive is a breaking change,
> > so
> > > it
> > > >> > > would
> > > >> > > > > > > require an Impala 4.x code line. I have a specific
> > proposal,
> > > >> but
> > > >> > > this
> > > >> > > > > is
> > > >> > > > > > > mainly a frame for getting the discussion going.
> > > >> > > > > > >
> > > >> > > > > > > I propose that we release Impala 3.4.0 and then update
> > > master
> > > >> to
> > > >> > > 4.0
> > > >> > > > > and
> > > >> > > > > > > allow breaking changes until the Impala 4.0 release. The
> > > main
> > > >> > > > breaking
> > > >> > > > > > > change would be to set USE_CDP_HIVE=true, enabling Hive
> 3+
> > > >> > > > development
> > > >> > > > > by
> > > >> > > > > > > default. The Hive 2 configuration would be removed over
> > > time.
> > > >> > Other
> > > >> > > > > > > breaking changes can be proposed and voted on.
> > > >> > > > > > >
> > > >> > > > > > > If there are developers interested in maintaining a 3.x
> > > >> branch,
> > > >> > we
> > > >> > > > can
> > > >> > > > > > > create this branch and add appropriate support to any
> > > >> > > infrastructure
> > > >> > > > > > (e.g.
> > > >> > > > > > > bin/push_to_asf.py) to allow that.
> > > >> > > > > > >
> > > >> > > > > > > Thoughts?
> > > >> > > > > > >
> > > >> > > > > > > Thanks,
> > > >> > > > > > >
> > > >> > > > > > > Joe McDonnell
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > --
> > > >> > > > > Sahil Takiar
> > > >> > > > > Software Engineer
> > > >> > > > > takiar.sahil@gmail.com | (510) 673-0309
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: Impala 4.x (and 3.x)

Posted by Tim Armstrong <ta...@cloudera.com>.
Sounds good to me - that approach has been successful before - i.e.
switching the default in the major version, deprecating the old one, and
then removing it some number of releases later. Although in some cases we
have been slow to remove the code, e.g. the old decimal behaviour is still
accessible.

On Wed, Feb 12, 2020 at 4:37 PM Vihang Karajgaonkar <vi...@cloudera.com>
wrote:

> Yes, I think changing to local catalog can be done in 4.0 branch. Based on
> the feedback received here I think we can start removing the code for 3.x
> catalog implementation after we fill in some of the gaps in local catalog.
> We can selectively decide which ones are (eg. external data sources, HDFS
> caching) important and need to be supported in local catalog mode. Having
> just one mode of catalog will simplify implementation of newer features
> like catalog HA, fine-grained partition level metadata a lot.
>
> On Wed, Feb 12, 2020 at 3:46 PM Tim Armstrong <ta...@cloudera.com>
> wrote:
>
> > Do we plan to switch the default catalog implementation to the local
> > catalog as well?
> >
> > On Wed, Jan 29, 2020 at 3:41 PM Tim Armstrong <ta...@cloudera.com>
> > wrote:
> >
> > > We haven't generally had clear guarantees about what OS versions we're
> > > going to support. I ran into issues with
> > > https://issues.apache.org/jira/browse/IMPALA-8508 where we couldn't
> run
> > > python 3 on centos 6.4, for example.
> > >
> > > Would it be possible to explicitly state what OS versions we're not
> going
> > > to support in Impala 4.0?
> > >
> > > On Mon, Jan 27, 2020 at 12:05 PM Joe McDonnell <
> > joemcdonnell@cloudera.com>
> > > wrote:
> > >
> > >> Given the positive feedback, I will begin moving forward on the
> proposed
> > >> plan. I will start a new thread to discuss the Impala 3.4 release, and
> > we
> > >> can start collecting/discussing the breaking changes that we want for
> > >> Impala 4.0.
> > >>
> > >> Feedback about the plan is still welcome on this thread.
> Alternatively,
> > >> any
> > >> concerns can be raised in a new thread here on dev@.
> > >>
> > >> Thanks,
> > >> Joe
> > >>
> > >> On Wed, Jan 22, 2020 at 9:06 AM Laszlo Gaal <laszlo.gaal@cloudera.com
> >
> > >> wrote:
> > >>
> > >> > +1
> > >> >
> > >> > I'd also add that bumping the major version of Impala opens the
> window
> > >> > for introducing breaking changes.
> > >> >
> > >> > I don't intend to hijack this mail thread for that purpose,
> > >> > but I'd like to suggest compiling such a list in the context of the
> > >> version
> > >> > bump.
> > >> >
> > >> > Thanks,
> > >> >
> > >> >     - LaszloG
> > >> >
> > >> > On Tue, Jan 21, 2020 at 6:48 PM Anurag Mantripragada <
> > >> anurag@cloudera.com>
> > >> > wrote:
> > >> >
> > >> > > This makes sense.
> > >> > > +1
> > >> > >
> > >> > > On Tue, Jan 21, 2020 at 9:03 AM Andrew Sherman <
> > asherman@cloudera.com
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > +1
> > >> > > >
> > >> > > >
> > >> > > > On Tue, Jan 21, 2020 at 8:28 AM Sahil Takiar <
> > >> takiar.sahil@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > +1 makes sense to me.
> > >> > > > >
> > >> > > > > On Mon, Jan 20, 2020 at 4:55 PM Tim Armstrong <
> > >> > tarmstrong@cloudera.com
> > >> > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > I think this proposal make sense - we've done well in
> enabling
> > >> > > parallel
> > >> > > > > > development for different Hive versions so far, but it is a
> > >> burden.
> > >> > > > E.g.
> > >> > > > > we
> > >> > > > > > still don't have precommit tests for Hive 3+ (I like that
> > name)
> > >> > and I
> > >> > > > > don't
> > >> > > > > > know that we want to go about making the suite of precommit
> > >> tests
> > >> > > even
> > >> > > > > > larger.
> > >> > > > > >
> > >> > > > > > On Fri, Jan 17, 2020 at 4:29 PM Joe McDonnell <
> > >> > > > joemcdonnell@cloudera.com
> > >> > > > > >
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > I wanted to start a conversation around moving to develop
> > >> against
> > >> > > > Hive
> > >> > > > > 3+
> > >> > > > > > > by default. (I describe this as Hive 3+ because it is
> close
> > to
> > >> > Hive
> > >> > > > > > master,
> > >> > > > > > > which is well beyond any released Hive 3.) There has been
> > >> > > > considerable
> > >> > > > > > > development effort towards implementing features
> integrating
> > >> > Impala
> > >> > > > > with
> > >> > > > > > > Hive 3+ and Hive ACID. This is currently developed under
> the
> > >> > > > > > > USE_CDP_HIVE=true configuration while regular development
> > has
> > >> > > > continued
> > >> > > > > > > with Hive 2. The Hive 3+ development is now stable enough
> to
> > >> be
> > >> > > used
> > >> > > > > for
> > >> > > > > > > regular development. It would be nice to reduce our test
> and
> > >> > > > > > compatibility
> > >> > > > > > > matrix and have a unified development environment.
> > >> > > > > > >
> > >> > > > > > > Changing the major version of Hive is a breaking change,
> so
> > it
> > >> > > would
> > >> > > > > > > require an Impala 4.x code line. I have a specific
> proposal,
> > >> but
> > >> > > this
> > >> > > > > is
> > >> > > > > > > mainly a frame for getting the discussion going.
> > >> > > > > > >
> > >> > > > > > > I propose that we release Impala 3.4.0 and then update
> > master
> > >> to
> > >> > > 4.0
> > >> > > > > and
> > >> > > > > > > allow breaking changes until the Impala 4.0 release. The
> > main
> > >> > > > breaking
> > >> > > > > > > change would be to set USE_CDP_HIVE=true, enabling Hive 3+
> > >> > > > development
> > >> > > > > by
> > >> > > > > > > default. The Hive 2 configuration would be removed over
> > time.
> > >> > Other
> > >> > > > > > > breaking changes can be proposed and voted on.
> > >> > > > > > >
> > >> > > > > > > If there are developers interested in maintaining a 3.x
> > >> branch,
> > >> > we
> > >> > > > can
> > >> > > > > > > create this branch and add appropriate support to any
> > >> > > infrastructure
> > >> > > > > > (e.g.
> > >> > > > > > > bin/push_to_asf.py) to allow that.
> > >> > > > > > >
> > >> > > > > > > Thoughts?
> > >> > > > > > >
> > >> > > > > > > Thanks,
> > >> > > > > > >
> > >> > > > > > > Joe McDonnell
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > Sahil Takiar
> > >> > > > > Software Engineer
> > >> > > > > takiar.sahil@gmail.com | (510) 673-0309
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: Impala 4.x (and 3.x)

Posted by Vihang Karajgaonkar <vi...@cloudera.com>.
Yes, I think changing to local catalog can be done in 4.0 branch. Based on
the feedback received here I think we can start removing the code for 3.x
catalog implementation after we fill in some of the gaps in local catalog.
We can selectively decide which ones are (eg. external data sources, HDFS
caching) important and need to be supported in local catalog mode. Having
just one mode of catalog will simplify implementation of newer features
like catalog HA, fine-grained partition level metadata a lot.

On Wed, Feb 12, 2020 at 3:46 PM Tim Armstrong <ta...@cloudera.com>
wrote:

> Do we plan to switch the default catalog implementation to the local
> catalog as well?
>
> On Wed, Jan 29, 2020 at 3:41 PM Tim Armstrong <ta...@cloudera.com>
> wrote:
>
> > We haven't generally had clear guarantees about what OS versions we're
> > going to support. I ran into issues with
> > https://issues.apache.org/jira/browse/IMPALA-8508 where we couldn't run
> > python 3 on centos 6.4, for example.
> >
> > Would it be possible to explicitly state what OS versions we're not going
> > to support in Impala 4.0?
> >
> > On Mon, Jan 27, 2020 at 12:05 PM Joe McDonnell <
> joemcdonnell@cloudera.com>
> > wrote:
> >
> >> Given the positive feedback, I will begin moving forward on the proposed
> >> plan. I will start a new thread to discuss the Impala 3.4 release, and
> we
> >> can start collecting/discussing the breaking changes that we want for
> >> Impala 4.0.
> >>
> >> Feedback about the plan is still welcome on this thread. Alternatively,
> >> any
> >> concerns can be raised in a new thread here on dev@.
> >>
> >> Thanks,
> >> Joe
> >>
> >> On Wed, Jan 22, 2020 at 9:06 AM Laszlo Gaal <la...@cloudera.com>
> >> wrote:
> >>
> >> > +1
> >> >
> >> > I'd also add that bumping the major version of Impala opens the window
> >> > for introducing breaking changes.
> >> >
> >> > I don't intend to hijack this mail thread for that purpose,
> >> > but I'd like to suggest compiling such a list in the context of the
> >> version
> >> > bump.
> >> >
> >> > Thanks,
> >> >
> >> >     - LaszloG
> >> >
> >> > On Tue, Jan 21, 2020 at 6:48 PM Anurag Mantripragada <
> >> anurag@cloudera.com>
> >> > wrote:
> >> >
> >> > > This makes sense.
> >> > > +1
> >> > >
> >> > > On Tue, Jan 21, 2020 at 9:03 AM Andrew Sherman <
> asherman@cloudera.com
> >> >
> >> > > wrote:
> >> > >
> >> > > > +1
> >> > > >
> >> > > >
> >> > > > On Tue, Jan 21, 2020 at 8:28 AM Sahil Takiar <
> >> takiar.sahil@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > +1 makes sense to me.
> >> > > > >
> >> > > > > On Mon, Jan 20, 2020 at 4:55 PM Tim Armstrong <
> >> > tarmstrong@cloudera.com
> >> > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > I think this proposal make sense - we've done well in enabling
> >> > > parallel
> >> > > > > > development for different Hive versions so far, but it is a
> >> burden.
> >> > > > E.g.
> >> > > > > we
> >> > > > > > still don't have precommit tests for Hive 3+ (I like that
> name)
> >> > and I
> >> > > > > don't
> >> > > > > > know that we want to go about making the suite of precommit
> >> tests
> >> > > even
> >> > > > > > larger.
> >> > > > > >
> >> > > > > > On Fri, Jan 17, 2020 at 4:29 PM Joe McDonnell <
> >> > > > joemcdonnell@cloudera.com
> >> > > > > >
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > I wanted to start a conversation around moving to develop
> >> against
> >> > > > Hive
> >> > > > > 3+
> >> > > > > > > by default. (I describe this as Hive 3+ because it is close
> to
> >> > Hive
> >> > > > > > master,
> >> > > > > > > which is well beyond any released Hive 3.) There has been
> >> > > > considerable
> >> > > > > > > development effort towards implementing features integrating
> >> > Impala
> >> > > > > with
> >> > > > > > > Hive 3+ and Hive ACID. This is currently developed under the
> >> > > > > > > USE_CDP_HIVE=true configuration while regular development
> has
> >> > > > continued
> >> > > > > > > with Hive 2. The Hive 3+ development is now stable enough to
> >> be
> >> > > used
> >> > > > > for
> >> > > > > > > regular development. It would be nice to reduce our test and
> >> > > > > > compatibility
> >> > > > > > > matrix and have a unified development environment.
> >> > > > > > >
> >> > > > > > > Changing the major version of Hive is a breaking change, so
> it
> >> > > would
> >> > > > > > > require an Impala 4.x code line. I have a specific proposal,
> >> but
> >> > > this
> >> > > > > is
> >> > > > > > > mainly a frame for getting the discussion going.
> >> > > > > > >
> >> > > > > > > I propose that we release Impala 3.4.0 and then update
> master
> >> to
> >> > > 4.0
> >> > > > > and
> >> > > > > > > allow breaking changes until the Impala 4.0 release. The
> main
> >> > > > breaking
> >> > > > > > > change would be to set USE_CDP_HIVE=true, enabling Hive 3+
> >> > > > development
> >> > > > > by
> >> > > > > > > default. The Hive 2 configuration would be removed over
> time.
> >> > Other
> >> > > > > > > breaking changes can be proposed and voted on.
> >> > > > > > >
> >> > > > > > > If there are developers interested in maintaining a 3.x
> >> branch,
> >> > we
> >> > > > can
> >> > > > > > > create this branch and add appropriate support to any
> >> > > infrastructure
> >> > > > > > (e.g.
> >> > > > > > > bin/push_to_asf.py) to allow that.
> >> > > > > > >
> >> > > > > > > Thoughts?
> >> > > > > > >
> >> > > > > > > Thanks,
> >> > > > > > >
> >> > > > > > > Joe McDonnell
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > Sahil Takiar
> >> > > > > Software Engineer
> >> > > > > takiar.sahil@gmail.com | (510) 673-0309
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>