You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by Quanlong Huang <hu...@gmail.com> on 2019/01/28 03:09:42 UTC

Move forward branch-2.x

Hi friends,

It's time to move forward the branch-2.x. Though we've made great
features/improvements in Impala-3.x, people’s impression of Impala is still
in the 2.x era. Most of them still using Hadoop2 in production and have no
choices to try Impala-3.x. I believe Hadoop2 will still be used for some
years. It's a pity if we lose those users.

I'd like to have a try to move forward branch-2.x. Hopes you can give some
suggestions! There're two proposals I can come up with:
(a) Cherry-pick mature improvements/features into branch-2.x feature by
feature.
(b) Cherry-pick commits in branch-3.x one by one (skip those just for 3.x)

I summarize a "commits diff" between branch-3.x, branch-2.x and
cloudera/cdh-5.16.1-release:
https://docs.google.com/spreadsheets/d/12h1rTAPS1gm0vhlDGxeOXjnRD7rrOcoqzX4rjRRCyBg

It shows up that Cloudera release is doing in (a) and pick up few commits.
However, It does pick up some commits in batch from branch-3.x (e.g.
commits of LocalCatalog). I think it's a good example for (a).

However, (a) needs more efforts than (b). If we doing in way (b), we just
need to fix cherry-pick conflicts, run GVO and then merge the commit if the
tests are passed.

What do you think? Could anyone share some experience about how other
projects (e.g. Hadoop, Hive, HBase) manage several branches together?

Thanks,
Quanlong Huang

Re: Move forward branch-2.x

Posted by Quanlong Huang <hu...@gmail.com>.
Great thanks to guys that help to review the previous patches!

Now we are at this patch: https://gerrit.cloudera.org/c/12546/. It's a
document patch so I think everyone can help to confirm it. Could anyone
help to have a look?

Thanks,
Quanlong

On Sat, Feb 9, 2019 at 10:44 PM Quanlong Huang <hu...@gmail.com>
wrote:

> Hi all,
>
> We've moved to the next patch: https://gerrit.cloudera.org/c/12345/. It's
> the first step to add LocalCatalog to branch-2.x. Anyone could give it
> a +2? Or anyone has objections for it?
>
> Thanks,
> Quanlong
>
> On Thu, Jan 31, 2019 at 9:00 PM Quanlong Huang <hu...@gmail.com>
> wrote:
>
>> Sure. I think "fine-grained privileges" always introduce small changes in
>> behaviors, i.e. unprivileged users used to be able to do something but they
>> can't do so after an upgrade. We accept it since it's reasonable.
>>
>> There're incompatible changes too in the previous releases:
>> https://www.cloudera.com/documentation/enterprise/release-notes/topics/impala_incompatible_changes.html.
>> What we need is to document it well :)
>>
>> I just moved forward and start GVO job for the patch. Thanks!
>>
>>
>>
>> On Thu, Jan 31, 2019 at 2:00 AM Bharath Vissapragada <
>> bharathv@cloudera.com> wrote:
>>
>>> On Wed, Jan 30, 2019 at 12:21 AM Quanlong Huang <huangquanlong@gmail.com
>>> >
>>> wrote:
>>>
>>> > I'm afraid the difference between branch-2.x and Cloudera's branch is
>>> > larger than the difference between branch-2.x and master branch.
>>> Cloudera's
>>> > branch already ignored lots of commits, which causes the gap. I've
>>> tried
>>> > cherry-pick from master or Cloudera's branch and found it's much
>>> easier to
>>> > pick from master branch.
>>> > If https://gerrit.cloudera.org/c/12292/ is merged, I can easily pick
>>> 40+
>>> > commits into branch-2.x with few conflicts to resolve!! See the first
>>> > column in the sheet:
>>> >
>>> >
>>> https://docs.google.com/spreadsheets/d/12h1rTAPS1gm0vhlDGxeOXjnRD7rrOcoqzX4rjRRCyBg
>>> > The result is here:
>>> > https://github.com/stiga-huang/incubator-impala/tree/future-2.x
>>>
>>>
>>> Sure whatever works best for you. You are right that the Cloudera branch
>>> was selective in cherry-picking stuff. We mostly focussed on "fetch
>>> on-demand metadata" changes and "finer grained privileges" and ignored
>>> the
>>> rest. If that is what you are looking for, it is easier to cherry-pick
>>> from
>>> Cloudera's branch. Otherwise probably better to replay commits from the
>>> master branch.
>>>
>>>
>>> >
>>> > We can restart the cherrypick-2.x-and-test Jenkins job. Each time
>>> > there're conflicts, I'll come to resolve it. If the job keeps running,
>>> it's
>>> > possible for branch-2.x to catch up the master branch!
>>> >
>>> > Besides, https://gerrit.cloudera.org/c/12292/ is about DESCRIBE
>>> behavior
>>> > in
>>> > FGP(Fine-grained privileges). I think it's reasonable and not
>>> > compatibility breaking. Does anyone have more thoughts about this?
>>> >
>>>
>>> Hmm, I see what you are saying.  Definitely helps to include it to make
>>> future cherry-picks easier.
>>>
>>> Technically it is still a behavioral change, especially if someone
>>> upgrades
>>> to a version with this fix and we typically try to avoid that (describe
>>> that worked before doesn't work after upgrade). I can't speak about why
>>> we
>>> included it in the Cloudera branch since that was an internal decision
>>> but
>>> I don't know if we have any policies here around backporting such stuff
>>> into older branches. Maybe good to know what others think.
>>>
>>> Anyway, I don't feel too strongly about this and since it is blocking
>>> your
>>> work, I removed my -1 on the code review but this is something to keep in
>>> mind when backporting such patches.
>>>
>>>
>>> >
>>> > On Wed, Jan 30, 2019 at 9:41 AM Fredy Wijaya <fw...@cloudera.com>
>>> wrote:
>>> >
>>> > > Due to the way we build 2.x where it can't use the pinned versions
>>> of CDH
>>> > > dependencies, it may be better to cherry-pick all commits in
>>> > > https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0, which
>>> also
>>> > > includes that DESCRIBE commit to avoid further integration issues
>>> later
>>> > on.
>>> > >
>>> > > On Tue, Jan 29, 2019 at 5:05 PM Quanlong Huang <
>>> huangquanlong@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > Hi Bharath,
>>> > > >
>>> > > > Thank you a lot for your notice! However, I've gone through the
>>> commits
>>> > > of
>>> > > > cdh branch before and found that this patch is also picked:
>>> > > >
>>> > > >
>>> > >
>>> >
>>> https://github.com/cloudera/Impala/commit/f8a318d4f75e22a963b9cf4786ef07d2cd6bd93c
>>> > > > .
>>> > > > Is this really a compatibility breaking change?
>>> > > >
>>> > > > I'm also concern that the TestDescribeTableResults it introduced
>>> is too
>>> > > > strictly that may cause troubles. However, I found two later
>>> commits
>>> > > > (IMPALA-7143 and IMPALA-7144) would fix this. I'm going to
>>> cherry-pick
>>> > > > these two and IMPALA-7676 (thanks Fredy's advise too!) right after
>>> > > > https://gerrit.cloudera.org/c/12292/ is merged.
>>> > > >
>>> > > > Please let me know if this will go astray. Thanks!
>>> > > >
>>> > > >
>>> > > > On Wed, Jan 30, 2019 at 12:36 AM Bharath Vissapragada <
>>> > > > bharathv@cloudera.com>
>>> > > > wrote:
>>> > > >
>>> > > > > On Mon, Jan 28, 2019 at 11:36 PM Quanlong Huang <
>>> > > huangquanlong@gmail.com
>>> > > > >
>>> > > > > wrote:
>>> > > > >
>>> > > > > > >>For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
>>> > > > > interfaces
>>> > > > > > for Db, View, Table, Partition", the cherry-pick conflicts is
>>> due
>>> > to
>>> > > > the
>>> > > > > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
>>> > IMPALA-6479
>>> > > > > being
>>> > > > > > picked back. Does anyone know why we revert it? (I also
>>> comment in
>>> > > the
>>> > > > > > JIRA).
>>> > > > > > >
>>> > > > > > >There are test failures. I guess it's the reason. Hopefully,
>>> > > > > > cdh-5.16.1-release already picked up this patch, which provides
>>> > some
>>> > > > > > pointers :)
>>> > > > > >
>>> > > > > > I fix the test failures and create a review at
>>> > > > > > https://gerrit.cloudera.org/c/12292/
>>> > > > > > Waiting for Jenkins maintenance to finish and then run a GVO.
>>> Hopes
>>> > > > > someone
>>> > > > > > can join and have a look!
>>> > > > > >
>>> > > > > > On Tue, Jan 29, 2019 at 7:39 AM Quanlong Huang <
>>> > > > huangquanlong@gmail.com>
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > > > >For the first patch, "0b540b025 IMPALA-7128 (part 1)
>>> Refactor
>>> > > > > interfaces
>>> > > > > > > for Db, View, Table, Partition", the cherry-pick conflicts
>>> is due
>>> > > to
>>> > > > > the
>>> > > > > > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
>>> > > IMPALA-6479
>>> > > > > > being
>>> > > > > > > picked back. Does anyone know why we revert it? (I also
>>> comment
>>> > in
>>> > > > the
>>> > > > > > > JIRA).
>>> > > > > >
>>> > > > >
>>> > > > > It was reverted because it is a compatibility breaking change. We
>>> > > > typically
>>> > > > > try not to introduce such behavioral changes in the same major
>>> > version
>>> > > > line
>>> > > > > as that can cause upgrade issues.
>>> > > > >
>>> > > > >
>>> > > > > > >
>>> > > > > > > There are test failures. I guess it's the reason. Hopefully,
>>> > > > > > > cdh-5.16.1-release already picked up this patch, which
>>> provides
>>> > > some
>>> > > > > > > pointers :)
>>> > > > > >
>>> > > > >
>>> > > > > I work at Cloudera and we've gone through this exercise before.
>>> It is
>>> > > > > annoying to resolve the conflicts, so you can reuse our work and
>>> save
>>> > > > some
>>> > > > > time.
>>> > > > > https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0
>>> > > > >
>>> > > > >
>>> > > > > > >
>>> > > > > > > On Mon, Jan 28, 2019 at 10:51 PM Quanlong Huang <
>>> > > > > huangquanlong@gmail.com
>>> > > > > > >
>>> > > > > > > wrote:
>>> > > > > > >
>>> > > > > > >> Yes, there are two discussion threads before that are
>>> relative
>>> > to
>>> > > > > this.
>>> > > > > > >> One for stopping the cherrypick-2.x-and-test jenkins job:
>>> > > > > > >>
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://lists.apache.org/thread.html/2b4b62d4c07661b27a5203618cb0425a429f6460f2eb505acbcd26c6@%3Cdev.impala.apache.org%3E
>>> > > > > > >>
>>> > > > > > >> The other for removing support for hadoop 2 in master
>>> branch:
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>> https://lists.apache.org/thread.html/49f9b68ed3d6d2c0fdee16a877b259922545e4824e1233479227a657@%3Cdev.impala.apache.org%3E
>>> > > > > > >>
>>> > > > > > >> I'm +1 with the second thread that we only support Hadoop 2
>>> in
>>> > > > > > branch-2.x
>>> > > > > > >> and support Hadoop 3 in the master branch to be more
>>> focused.
>>> > I'm
>>> > > > also
>>> > > > > > >> agree with Paul's concern. It's such a dilemma that if we
>>> skip
>>> > > some
>>> > > > > > >> commits, things will be harder and harder as we moving
>>> forward;
>>> > if
>>> > > > we
>>> > > > > > >> cherry-pick, review, and test the commits one by one,
>>> branch-2.x
>>> > > > will
>>> > > > > > never
>>> > > > > > >> catch up the master branch, which is an obstacle if someone
>>> > (like
>>> > > > me)
>>> > > > > > wants
>>> > > > > > >> to backport his/her new patch to branch-2.x but waits too
>>> long
>>> > and
>>> > > > > > finally
>>> > > > > > >> fogets details of the patch.
>>> > > > > > >>
>>> > > > > > >> I roughly investigated how other systems deal with multiple
>>> > > > branches.
>>> > > > > > The
>>> > > > > > >> efforts to backport a patch could be the same for the
>>> original
>>> > > > patch.
>>> > > > > > It's
>>> > > > > > >> not a easy go, so the Hive community declares that
>>> > > > > > >> "The decision to port a feature from master to branch-1 is
>>> at
>>> > the
>>> > > > > > >> discretion of the contributor and committer. However no
>>> features
>>> > > > that
>>> > > > > > break
>>> > > > > > >> backwards compatibility will be accepted on branch-1."
>>> > > > > > >>
>>> > > > > > >> I think it's a chance to understand more parts of Impala by
>>> > > learning
>>> > > > > and
>>> > > > > > >> backporting the patches, since they have execellent commit
>>> > > messages
>>> > > > > and
>>> > > > > > >> were strictly reviewed. So I volunteer for the job to move
>>> > forward
>>> > > > the
>>> > > > > > >> branch-2.x. Hopes patch authors could give some pointers
>>> when
>>> > I'm
>>> > > > > > blocked!
>>> > > > > > >> I'll try approach (b) first and switch to (a) when (b)
>>> becomes
>>> > > > > > impossible
>>> > > > > > >> after too many commits are skipped. I'll confirm with the
>>> author
>>> > > if
>>> > > > I
>>> > > > > > think
>>> > > > > > >> a patch should be skipped.
>>> > > > > > >>
>>> > > > > > >> For the first patch, "0b540b025 IMPALA-7128 (part 1)
>>> Refactor
>>> > > > > interfaces
>>> > > > > > >> for Db, View, Table, Partition", the cherry-pick conflicts
>>> is
>>> > due
>>> > > to
>>> > > > > the
>>> > > > > > >> revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
>>> > > > IMPALA-6479
>>> > > > > > being
>>> > > > > > >> picked back. Does anyone know why we revert it? (I also
>>> comment
>>> > in
>>> > > > the
>>> > > > > > >> JIRA).
>>> > > > > > >>
>>> > > > > > >> On Mon, Jan 28, 2019 at 12:43 PM Philip Zeyliger <
>>> > > > philip@cloudera.com
>>> > > > > >
>>> > > > > > >> wrote:
>>> > > > > > >>
>>> > > > > > >>> As for Quanlong's question, I think the answer is however
>>> the
>>> > > folks
>>> > > > > who
>>> > > > > > >>> want to do the work prefer to do it. As you noticed in the
>>> CDH
>>> > > > > > >>> changelists,
>>> > > > > > >>> Cloudera's distribution has opted for something more like
>>> > > approach
>>> > > > > (a),
>>> > > > > > >>> choosing to backport individual features. For a while, we
>>> were
>>> > > > doing
>>> > > > > > >>> automation for cherry-picking things automatically, and it
>>> got
>>> > > > > tedious
>>> > > > > > >>> enough that we decided to turn it off.
>>> > > > > > >>>
>>> > > > > > >>> On Sun, Jan 27, 2019 at 7:37 PM Paul Rogers <
>>> > > progers@cloudera.com>
>>> > > > > > >>> wrote:
>>> > > > > > >>>
>>> > > > > > >>> > Hi Quanlong,
>>> > > > > > >>> >
>>> > > > > > >>> > Thanks for the suggestion. I wonder if there is a third
>>> > > strategy:
>>> > > > > > >>> >
>>> > > > > > >>> > c) Isolate the Hadoop 2.x/3.x differences into
>>> > clearly-defined
>>> > > > > driver
>>> > > > > > >>> > layer so that basically all of 3.x can be applied to the
>>> 2.x
>>> > > > > branch.
>>> > > > > > >>> Said
>>> > > > > > >>> > another way, a single source base can work against either
>>> > > Hadoop
>>> > > > > 2.x
>>> > > > > > or
>>> > > > > > >>> > 3.x, with the build (C++) or runtime (Java) choosing the
>>> > proper
>>> > > > > > >>> “driver”
>>> > > > > > >>> > classes.
>>> > > > > > >>> >
>>> > > > > > >>>
>>> > > > > > >>> We had such a layer for a while, where Impala master could
>>> be
>>> > > built
>>> > > > > > >>> against
>>> > > > > > >>> either Hadoop3 or Hadoop2. We decided to clean it up in
>>> commit
>>> > > > > > >>> e4ae605b083ab536c68552e37ca3c46f6bff4c76.
>>> > > > > > >>>
>>> > > > > > >>> commit e4ae605b083ab536c68552e37ca3c46f6bff4c76
>>> > > > > > >>> Author: Fredy Wijaya <fw...@cloudera.com>
>>> > > > > > >>> Date:   Thu Jul 12 17:01:13 2018 -0700
>>> > > > > > >>>
>>> > > > > > >>>     IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
>>> > > > > > >>>
>>> > > > > > >>>     This patch removes the use of
>>> IMPALA_MINICLUSTER_PROFILE.
>>> > The
>>> > > > > code
>>> > > > > > >>> that
>>> > > > > > >>>     uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it
>>> > defaults
>>> > > to
>>> > > > > > code
>>> > > > > > >>> from
>>> > > > > > >>>     IMPALA_MINICLUSTER_PROFILE=3. In order to reduce
>>> having too
>>> > > > many
>>> > > > > > code
>>> > > > > > >>>     changes in this patch, there is no code change for the
>>> > shims.
>>> > > > The
>>> > > > > > >>> shims
>>> > > > > > >>>     for IMPALA_MINICLUSTER_PROFILE=3 automatically become
>>> the
>>> > > > default
>>> > > > > > >>>     implementation.
>>> > > > > > >>>
>>> > > > > > >>>     Testing:
>>> > > > > > >>>     - Ran core and exhaustive tests
>>> > > > > > >>>
>>> > > > > > >>>     Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
>>> > > > > > >>>     Reviewed-on: http://gerrit.cloudera.org:8080/10940
>>> > > > > > >>>     Reviewed-by: Impala Public Jenkins <
>>> > > > > > >>> impala-public-jenkins@cloudera.com>
>>> > > > > > >>>     Tested-by: Impala Public Jenkins <
>>> > > > > > impala-public-jenkins@cloudera.com
>>> > > > > > >>> >
>>> > > > > > >>>
>>> > > > > > >>
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>> >
>>>
>>

Re: Move forward branch-2.x

Posted by Quanlong Huang <hu...@gmail.com>.
Hi all,

We've moved to the next patch: https://gerrit.cloudera.org/c/12345/. It's
the first step to add LocalCatalog to branch-2.x. Anyone could give it
a +2? Or anyone has objections for it?

Thanks,
Quanlong

On Thu, Jan 31, 2019 at 9:00 PM Quanlong Huang <hu...@gmail.com>
wrote:

> Sure. I think "fine-grained privileges" always introduce small changes in
> behaviors, i.e. unprivileged users used to be able to do something but they
> can't do so after an upgrade. We accept it since it's reasonable.
>
> There're incompatible changes too in the previous releases:
> https://www.cloudera.com/documentation/enterprise/release-notes/topics/impala_incompatible_changes.html.
> What we need is to document it well :)
>
> I just moved forward and start GVO job for the patch. Thanks!
>
>
>
> On Thu, Jan 31, 2019 at 2:00 AM Bharath Vissapragada <
> bharathv@cloudera.com> wrote:
>
>> On Wed, Jan 30, 2019 at 12:21 AM Quanlong Huang <hu...@gmail.com>
>> wrote:
>>
>> > I'm afraid the difference between branch-2.x and Cloudera's branch is
>> > larger than the difference between branch-2.x and master branch.
>> Cloudera's
>> > branch already ignored lots of commits, which causes the gap. I've tried
>> > cherry-pick from master or Cloudera's branch and found it's much easier
>> to
>> > pick from master branch.
>> > If https://gerrit.cloudera.org/c/12292/ is merged, I can easily pick
>> 40+
>> > commits into branch-2.x with few conflicts to resolve!! See the first
>> > column in the sheet:
>> >
>> >
>> https://docs.google.com/spreadsheets/d/12h1rTAPS1gm0vhlDGxeOXjnRD7rrOcoqzX4rjRRCyBg
>> > The result is here:
>> > https://github.com/stiga-huang/incubator-impala/tree/future-2.x
>>
>>
>> Sure whatever works best for you. You are right that the Cloudera branch
>> was selective in cherry-picking stuff. We mostly focussed on "fetch
>> on-demand metadata" changes and "finer grained privileges" and ignored the
>> rest. If that is what you are looking for, it is easier to cherry-pick
>> from
>> Cloudera's branch. Otherwise probably better to replay commits from the
>> master branch.
>>
>>
>> >
>> > We can restart the cherrypick-2.x-and-test Jenkins job. Each time
>> > there're conflicts, I'll come to resolve it. If the job keeps running,
>> it's
>> > possible for branch-2.x to catch up the master branch!
>> >
>> > Besides, https://gerrit.cloudera.org/c/12292/ is about DESCRIBE
>> behavior
>> > in
>> > FGP(Fine-grained privileges). I think it's reasonable and not
>> > compatibility breaking. Does anyone have more thoughts about this?
>> >
>>
>> Hmm, I see what you are saying.  Definitely helps to include it to make
>> future cherry-picks easier.
>>
>> Technically it is still a behavioral change, especially if someone
>> upgrades
>> to a version with this fix and we typically try to avoid that (describe
>> that worked before doesn't work after upgrade). I can't speak about why we
>> included it in the Cloudera branch since that was an internal decision but
>> I don't know if we have any policies here around backporting such stuff
>> into older branches. Maybe good to know what others think.
>>
>> Anyway, I don't feel too strongly about this and since it is blocking your
>> work, I removed my -1 on the code review but this is something to keep in
>> mind when backporting such patches.
>>
>>
>> >
>> > On Wed, Jan 30, 2019 at 9:41 AM Fredy Wijaya <fw...@cloudera.com>
>> wrote:
>> >
>> > > Due to the way we build 2.x where it can't use the pinned versions of
>> CDH
>> > > dependencies, it may be better to cherry-pick all commits in
>> > > https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0, which
>> also
>> > > includes that DESCRIBE commit to avoid further integration issues
>> later
>> > on.
>> > >
>> > > On Tue, Jan 29, 2019 at 5:05 PM Quanlong Huang <
>> huangquanlong@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi Bharath,
>> > > >
>> > > > Thank you a lot for your notice! However, I've gone through the
>> commits
>> > > of
>> > > > cdh branch before and found that this patch is also picked:
>> > > >
>> > > >
>> > >
>> >
>> https://github.com/cloudera/Impala/commit/f8a318d4f75e22a963b9cf4786ef07d2cd6bd93c
>> > > > .
>> > > > Is this really a compatibility breaking change?
>> > > >
>> > > > I'm also concern that the TestDescribeTableResults it introduced is
>> too
>> > > > strictly that may cause troubles. However, I found two later commits
>> > > > (IMPALA-7143 and IMPALA-7144) would fix this. I'm going to
>> cherry-pick
>> > > > these two and IMPALA-7676 (thanks Fredy's advise too!) right after
>> > > > https://gerrit.cloudera.org/c/12292/ is merged.
>> > > >
>> > > > Please let me know if this will go astray. Thanks!
>> > > >
>> > > >
>> > > > On Wed, Jan 30, 2019 at 12:36 AM Bharath Vissapragada <
>> > > > bharathv@cloudera.com>
>> > > > wrote:
>> > > >
>> > > > > On Mon, Jan 28, 2019 at 11:36 PM Quanlong Huang <
>> > > huangquanlong@gmail.com
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > >>For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
>> > > > > interfaces
>> > > > > > for Db, View, Table, Partition", the cherry-pick conflicts is
>> due
>> > to
>> > > > the
>> > > > > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
>> > IMPALA-6479
>> > > > > being
>> > > > > > picked back. Does anyone know why we revert it? (I also comment
>> in
>> > > the
>> > > > > > JIRA).
>> > > > > > >
>> > > > > > >There are test failures. I guess it's the reason. Hopefully,
>> > > > > > cdh-5.16.1-release already picked up this patch, which provides
>> > some
>> > > > > > pointers :)
>> > > > > >
>> > > > > > I fix the test failures and create a review at
>> > > > > > https://gerrit.cloudera.org/c/12292/
>> > > > > > Waiting for Jenkins maintenance to finish and then run a GVO.
>> Hopes
>> > > > > someone
>> > > > > > can join and have a look!
>> > > > > >
>> > > > > > On Tue, Jan 29, 2019 at 7:39 AM Quanlong Huang <
>> > > > huangquanlong@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > >For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
>> > > > > interfaces
>> > > > > > > for Db, View, Table, Partition", the cherry-pick conflicts is
>> due
>> > > to
>> > > > > the
>> > > > > > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
>> > > IMPALA-6479
>> > > > > > being
>> > > > > > > picked back. Does anyone know why we revert it? (I also
>> comment
>> > in
>> > > > the
>> > > > > > > JIRA).
>> > > > > >
>> > > > >
>> > > > > It was reverted because it is a compatibility breaking change. We
>> > > > typically
>> > > > > try not to introduce such behavioral changes in the same major
>> > version
>> > > > line
>> > > > > as that can cause upgrade issues.
>> > > > >
>> > > > >
>> > > > > > >
>> > > > > > > There are test failures. I guess it's the reason. Hopefully,
>> > > > > > > cdh-5.16.1-release already picked up this patch, which
>> provides
>> > > some
>> > > > > > > pointers :)
>> > > > > >
>> > > > >
>> > > > > I work at Cloudera and we've gone through this exercise before.
>> It is
>> > > > > annoying to resolve the conflicts, so you can reuse our work and
>> save
>> > > > some
>> > > > > time.
>> > > > > https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0
>> > > > >
>> > > > >
>> > > > > > >
>> > > > > > > On Mon, Jan 28, 2019 at 10:51 PM Quanlong Huang <
>> > > > > huangquanlong@gmail.com
>> > > > > > >
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > >> Yes, there are two discussion threads before that are
>> relative
>> > to
>> > > > > this.
>> > > > > > >> One for stopping the cherrypick-2.x-and-test jenkins job:
>> > > > > > >>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://lists.apache.org/thread.html/2b4b62d4c07661b27a5203618cb0425a429f6460f2eb505acbcd26c6@%3Cdev.impala.apache.org%3E
>> > > > > > >>
>> > > > > > >> The other for removing support for hadoop 2 in master branch:
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> https://lists.apache.org/thread.html/49f9b68ed3d6d2c0fdee16a877b259922545e4824e1233479227a657@%3Cdev.impala.apache.org%3E
>> > > > > > >>
>> > > > > > >> I'm +1 with the second thread that we only support Hadoop 2
>> in
>> > > > > > branch-2.x
>> > > > > > >> and support Hadoop 3 in the master branch to be more focused.
>> > I'm
>> > > > also
>> > > > > > >> agree with Paul's concern. It's such a dilemma that if we
>> skip
>> > > some
>> > > > > > >> commits, things will be harder and harder as we moving
>> forward;
>> > if
>> > > > we
>> > > > > > >> cherry-pick, review, and test the commits one by one,
>> branch-2.x
>> > > > will
>> > > > > > never
>> > > > > > >> catch up the master branch, which is an obstacle if someone
>> > (like
>> > > > me)
>> > > > > > wants
>> > > > > > >> to backport his/her new patch to branch-2.x but waits too
>> long
>> > and
>> > > > > > finally
>> > > > > > >> fogets details of the patch.
>> > > > > > >>
>> > > > > > >> I roughly investigated how other systems deal with multiple
>> > > > branches.
>> > > > > > The
>> > > > > > >> efforts to backport a patch could be the same for the
>> original
>> > > > patch.
>> > > > > > It's
>> > > > > > >> not a easy go, so the Hive community declares that
>> > > > > > >> "The decision to port a feature from master to branch-1 is at
>> > the
>> > > > > > >> discretion of the contributor and committer. However no
>> features
>> > > > that
>> > > > > > break
>> > > > > > >> backwards compatibility will be accepted on branch-1."
>> > > > > > >>
>> > > > > > >> I think it's a chance to understand more parts of Impala by
>> > > learning
>> > > > > and
>> > > > > > >> backporting the patches, since they have execellent commit
>> > > messages
>> > > > > and
>> > > > > > >> were strictly reviewed. So I volunteer for the job to move
>> > forward
>> > > > the
>> > > > > > >> branch-2.x. Hopes patch authors could give some pointers when
>> > I'm
>> > > > > > blocked!
>> > > > > > >> I'll try approach (b) first and switch to (a) when (b)
>> becomes
>> > > > > > impossible
>> > > > > > >> after too many commits are skipped. I'll confirm with the
>> author
>> > > if
>> > > > I
>> > > > > > think
>> > > > > > >> a patch should be skipped.
>> > > > > > >>
>> > > > > > >> For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
>> > > > > interfaces
>> > > > > > >> for Db, View, Table, Partition", the cherry-pick conflicts is
>> > due
>> > > to
>> > > > > the
>> > > > > > >> revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
>> > > > IMPALA-6479
>> > > > > > being
>> > > > > > >> picked back. Does anyone know why we revert it? (I also
>> comment
>> > in
>> > > > the
>> > > > > > >> JIRA).
>> > > > > > >>
>> > > > > > >> On Mon, Jan 28, 2019 at 12:43 PM Philip Zeyliger <
>> > > > philip@cloudera.com
>> > > > > >
>> > > > > > >> wrote:
>> > > > > > >>
>> > > > > > >>> As for Quanlong's question, I think the answer is however
>> the
>> > > folks
>> > > > > who
>> > > > > > >>> want to do the work prefer to do it. As you noticed in the
>> CDH
>> > > > > > >>> changelists,
>> > > > > > >>> Cloudera's distribution has opted for something more like
>> > > approach
>> > > > > (a),
>> > > > > > >>> choosing to backport individual features. For a while, we
>> were
>> > > > doing
>> > > > > > >>> automation for cherry-picking things automatically, and it
>> got
>> > > > > tedious
>> > > > > > >>> enough that we decided to turn it off.
>> > > > > > >>>
>> > > > > > >>> On Sun, Jan 27, 2019 at 7:37 PM Paul Rogers <
>> > > progers@cloudera.com>
>> > > > > > >>> wrote:
>> > > > > > >>>
>> > > > > > >>> > Hi Quanlong,
>> > > > > > >>> >
>> > > > > > >>> > Thanks for the suggestion. I wonder if there is a third
>> > > strategy:
>> > > > > > >>> >
>> > > > > > >>> > c) Isolate the Hadoop 2.x/3.x differences into
>> > clearly-defined
>> > > > > driver
>> > > > > > >>> > layer so that basically all of 3.x can be applied to the
>> 2.x
>> > > > > branch.
>> > > > > > >>> Said
>> > > > > > >>> > another way, a single source base can work against either
>> > > Hadoop
>> > > > > 2.x
>> > > > > > or
>> > > > > > >>> > 3.x, with the build (C++) or runtime (Java) choosing the
>> > proper
>> > > > > > >>> “driver”
>> > > > > > >>> > classes.
>> > > > > > >>> >
>> > > > > > >>>
>> > > > > > >>> We had such a layer for a while, where Impala master could
>> be
>> > > built
>> > > > > > >>> against
>> > > > > > >>> either Hadoop3 or Hadoop2. We decided to clean it up in
>> commit
>> > > > > > >>> e4ae605b083ab536c68552e37ca3c46f6bff4c76.
>> > > > > > >>>
>> > > > > > >>> commit e4ae605b083ab536c68552e37ca3c46f6bff4c76
>> > > > > > >>> Author: Fredy Wijaya <fw...@cloudera.com>
>> > > > > > >>> Date:   Thu Jul 12 17:01:13 2018 -0700
>> > > > > > >>>
>> > > > > > >>>     IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
>> > > > > > >>>
>> > > > > > >>>     This patch removes the use of
>> IMPALA_MINICLUSTER_PROFILE.
>> > The
>> > > > > code
>> > > > > > >>> that
>> > > > > > >>>     uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it
>> > defaults
>> > > to
>> > > > > > code
>> > > > > > >>> from
>> > > > > > >>>     IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having
>> too
>> > > > many
>> > > > > > code
>> > > > > > >>>     changes in this patch, there is no code change for the
>> > shims.
>> > > > The
>> > > > > > >>> shims
>> > > > > > >>>     for IMPALA_MINICLUSTER_PROFILE=3 automatically become
>> the
>> > > > default
>> > > > > > >>>     implementation.
>> > > > > > >>>
>> > > > > > >>>     Testing:
>> > > > > > >>>     - Ran core and exhaustive tests
>> > > > > > >>>
>> > > > > > >>>     Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
>> > > > > > >>>     Reviewed-on: http://gerrit.cloudera.org:8080/10940
>> > > > > > >>>     Reviewed-by: Impala Public Jenkins <
>> > > > > > >>> impala-public-jenkins@cloudera.com>
>> > > > > > >>>     Tested-by: Impala Public Jenkins <
>> > > > > > impala-public-jenkins@cloudera.com
>> > > > > > >>> >
>> > > > > > >>>
>> > > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Move forward branch-2.x

Posted by Quanlong Huang <hu...@gmail.com>.
Sure. I think "fine-grained privileges" always introduce small changes in
behaviors, i.e. unprivileged users used to be able to do something but they
can't do so after an upgrade. We accept it since it's reasonable.

There're incompatible changes too in the previous releases:
https://www.cloudera.com/documentation/enterprise/release-notes/topics/impala_incompatible_changes.html.
What we need is to document it well :)

I just moved forward and start GVO job for the patch. Thanks!



On Thu, Jan 31, 2019 at 2:00 AM Bharath Vissapragada <bh...@cloudera.com>
wrote:

> On Wed, Jan 30, 2019 at 12:21 AM Quanlong Huang <hu...@gmail.com>
> wrote:
>
> > I'm afraid the difference between branch-2.x and Cloudera's branch is
> > larger than the difference between branch-2.x and master branch.
> Cloudera's
> > branch already ignored lots of commits, which causes the gap. I've tried
> > cherry-pick from master or Cloudera's branch and found it's much easier
> to
> > pick from master branch.
> > If https://gerrit.cloudera.org/c/12292/ is merged, I can easily pick 40+
> > commits into branch-2.x with few conflicts to resolve!! See the first
> > column in the sheet:
> >
> >
> https://docs.google.com/spreadsheets/d/12h1rTAPS1gm0vhlDGxeOXjnRD7rrOcoqzX4rjRRCyBg
> > The result is here:
> > https://github.com/stiga-huang/incubator-impala/tree/future-2.x
>
>
> Sure whatever works best for you. You are right that the Cloudera branch
> was selective in cherry-picking stuff. We mostly focussed on "fetch
> on-demand metadata" changes and "finer grained privileges" and ignored the
> rest. If that is what you are looking for, it is easier to cherry-pick from
> Cloudera's branch. Otherwise probably better to replay commits from the
> master branch.
>
>
> >
> > We can restart the cherrypick-2.x-and-test Jenkins job. Each time
> > there're conflicts, I'll come to resolve it. If the job keeps running,
> it's
> > possible for branch-2.x to catch up the master branch!
> >
> > Besides, https://gerrit.cloudera.org/c/12292/ is about DESCRIBE behavior
> > in
> > FGP(Fine-grained privileges). I think it's reasonable and not
> > compatibility breaking. Does anyone have more thoughts about this?
> >
>
> Hmm, I see what you are saying.  Definitely helps to include it to make
> future cherry-picks easier.
>
> Technically it is still a behavioral change, especially if someone upgrades
> to a version with this fix and we typically try to avoid that (describe
> that worked before doesn't work after upgrade). I can't speak about why we
> included it in the Cloudera branch since that was an internal decision but
> I don't know if we have any policies here around backporting such stuff
> into older branches. Maybe good to know what others think.
>
> Anyway, I don't feel too strongly about this and since it is blocking your
> work, I removed my -1 on the code review but this is something to keep in
> mind when backporting such patches.
>
>
> >
> > On Wed, Jan 30, 2019 at 9:41 AM Fredy Wijaya <fw...@cloudera.com>
> wrote:
> >
> > > Due to the way we build 2.x where it can't use the pinned versions of
> CDH
> > > dependencies, it may be better to cherry-pick all commits in
> > > https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0, which also
> > > includes that DESCRIBE commit to avoid further integration issues later
> > on.
> > >
> > > On Tue, Jan 29, 2019 at 5:05 PM Quanlong Huang <
> huangquanlong@gmail.com>
> > > wrote:
> > >
> > > > Hi Bharath,
> > > >
> > > > Thank you a lot for your notice! However, I've gone through the
> commits
> > > of
> > > > cdh branch before and found that this patch is also picked:
> > > >
> > > >
> > >
> >
> https://github.com/cloudera/Impala/commit/f8a318d4f75e22a963b9cf4786ef07d2cd6bd93c
> > > > .
> > > > Is this really a compatibility breaking change?
> > > >
> > > > I'm also concern that the TestDescribeTableResults it introduced is
> too
> > > > strictly that may cause troubles. However, I found two later commits
> > > > (IMPALA-7143 and IMPALA-7144) would fix this. I'm going to
> cherry-pick
> > > > these two and IMPALA-7676 (thanks Fredy's advise too!) right after
> > > > https://gerrit.cloudera.org/c/12292/ is merged.
> > > >
> > > > Please let me know if this will go astray. Thanks!
> > > >
> > > >
> > > > On Wed, Jan 30, 2019 at 12:36 AM Bharath Vissapragada <
> > > > bharathv@cloudera.com>
> > > > wrote:
> > > >
> > > > > On Mon, Jan 28, 2019 at 11:36 PM Quanlong Huang <
> > > huangquanlong@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > >>For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> > > > > interfaces
> > > > > > for Db, View, Table, Partition", the cherry-pick conflicts is due
> > to
> > > > the
> > > > > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
> > IMPALA-6479
> > > > > being
> > > > > > picked back. Does anyone know why we revert it? (I also comment
> in
> > > the
> > > > > > JIRA).
> > > > > > >
> > > > > > >There are test failures. I guess it's the reason. Hopefully,
> > > > > > cdh-5.16.1-release already picked up this patch, which provides
> > some
> > > > > > pointers :)
> > > > > >
> > > > > > I fix the test failures and create a review at
> > > > > > https://gerrit.cloudera.org/c/12292/
> > > > > > Waiting for Jenkins maintenance to finish and then run a GVO.
> Hopes
> > > > > someone
> > > > > > can join and have a look!
> > > > > >
> > > > > > On Tue, Jan 29, 2019 at 7:39 AM Quanlong Huang <
> > > > huangquanlong@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > >For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> > > > > interfaces
> > > > > > > for Db, View, Table, Partition", the cherry-pick conflicts is
> due
> > > to
> > > > > the
> > > > > > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
> > > IMPALA-6479
> > > > > > being
> > > > > > > picked back. Does anyone know why we revert it? (I also comment
> > in
> > > > the
> > > > > > > JIRA).
> > > > > >
> > > > >
> > > > > It was reverted because it is a compatibility breaking change. We
> > > > typically
> > > > > try not to introduce such behavioral changes in the same major
> > version
> > > > line
> > > > > as that can cause upgrade issues.
> > > > >
> > > > >
> > > > > > >
> > > > > > > There are test failures. I guess it's the reason. Hopefully,
> > > > > > > cdh-5.16.1-release already picked up this patch, which provides
> > > some
> > > > > > > pointers :)
> > > > > >
> > > > >
> > > > > I work at Cloudera and we've gone through this exercise before. It
> is
> > > > > annoying to resolve the conflicts, so you can reuse our work and
> save
> > > > some
> > > > > time.
> > > > > https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0
> > > > >
> > > > >
> > > > > > >
> > > > > > > On Mon, Jan 28, 2019 at 10:51 PM Quanlong Huang <
> > > > > huangquanlong@gmail.com
> > > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Yes, there are two discussion threads before that are relative
> > to
> > > > > this.
> > > > > > >> One for stopping the cherrypick-2.x-and-test jenkins job:
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/2b4b62d4c07661b27a5203618cb0425a429f6460f2eb505acbcd26c6@%3Cdev.impala.apache.org%3E
> > > > > > >>
> > > > > > >> The other for removing support for hadoop 2 in master branch:
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/49f9b68ed3d6d2c0fdee16a877b259922545e4824e1233479227a657@%3Cdev.impala.apache.org%3E
> > > > > > >>
> > > > > > >> I'm +1 with the second thread that we only support Hadoop 2 in
> > > > > > branch-2.x
> > > > > > >> and support Hadoop 3 in the master branch to be more focused.
> > I'm
> > > > also
> > > > > > >> agree with Paul's concern. It's such a dilemma that if we skip
> > > some
> > > > > > >> commits, things will be harder and harder as we moving
> forward;
> > if
> > > > we
> > > > > > >> cherry-pick, review, and test the commits one by one,
> branch-2.x
> > > > will
> > > > > > never
> > > > > > >> catch up the master branch, which is an obstacle if someone
> > (like
> > > > me)
> > > > > > wants
> > > > > > >> to backport his/her new patch to branch-2.x but waits too long
> > and
> > > > > > finally
> > > > > > >> fogets details of the patch.
> > > > > > >>
> > > > > > >> I roughly investigated how other systems deal with multiple
> > > > branches.
> > > > > > The
> > > > > > >> efforts to backport a patch could be the same for the original
> > > > patch.
> > > > > > It's
> > > > > > >> not a easy go, so the Hive community declares that
> > > > > > >> "The decision to port a feature from master to branch-1 is at
> > the
> > > > > > >> discretion of the contributor and committer. However no
> features
> > > > that
> > > > > > break
> > > > > > >> backwards compatibility will be accepted on branch-1."
> > > > > > >>
> > > > > > >> I think it's a chance to understand more parts of Impala by
> > > learning
> > > > > and
> > > > > > >> backporting the patches, since they have execellent commit
> > > messages
> > > > > and
> > > > > > >> were strictly reviewed. So I volunteer for the job to move
> > forward
> > > > the
> > > > > > >> branch-2.x. Hopes patch authors could give some pointers when
> > I'm
> > > > > > blocked!
> > > > > > >> I'll try approach (b) first and switch to (a) when (b) becomes
> > > > > > impossible
> > > > > > >> after too many commits are skipped. I'll confirm with the
> author
> > > if
> > > > I
> > > > > > think
> > > > > > >> a patch should be skipped.
> > > > > > >>
> > > > > > >> For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> > > > > interfaces
> > > > > > >> for Db, View, Table, Partition", the cherry-pick conflicts is
> > due
> > > to
> > > > > the
> > > > > > >> revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
> > > > IMPALA-6479
> > > > > > being
> > > > > > >> picked back. Does anyone know why we revert it? (I also
> comment
> > in
> > > > the
> > > > > > >> JIRA).
> > > > > > >>
> > > > > > >> On Mon, Jan 28, 2019 at 12:43 PM Philip Zeyliger <
> > > > philip@cloudera.com
> > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >>> As for Quanlong's question, I think the answer is however the
> > > folks
> > > > > who
> > > > > > >>> want to do the work prefer to do it. As you noticed in the
> CDH
> > > > > > >>> changelists,
> > > > > > >>> Cloudera's distribution has opted for something more like
> > > approach
> > > > > (a),
> > > > > > >>> choosing to backport individual features. For a while, we
> were
> > > > doing
> > > > > > >>> automation for cherry-picking things automatically, and it
> got
> > > > > tedious
> > > > > > >>> enough that we decided to turn it off.
> > > > > > >>>
> > > > > > >>> On Sun, Jan 27, 2019 at 7:37 PM Paul Rogers <
> > > progers@cloudera.com>
> > > > > > >>> wrote:
> > > > > > >>>
> > > > > > >>> > Hi Quanlong,
> > > > > > >>> >
> > > > > > >>> > Thanks for the suggestion. I wonder if there is a third
> > > strategy:
> > > > > > >>> >
> > > > > > >>> > c) Isolate the Hadoop 2.x/3.x differences into
> > clearly-defined
> > > > > driver
> > > > > > >>> > layer so that basically all of 3.x can be applied to the
> 2.x
> > > > > branch.
> > > > > > >>> Said
> > > > > > >>> > another way, a single source base can work against either
> > > Hadoop
> > > > > 2.x
> > > > > > or
> > > > > > >>> > 3.x, with the build (C++) or runtime (Java) choosing the
> > proper
> > > > > > >>> “driver”
> > > > > > >>> > classes.
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>> We had such a layer for a while, where Impala master could be
> > > built
> > > > > > >>> against
> > > > > > >>> either Hadoop3 or Hadoop2. We decided to clean it up in
> commit
> > > > > > >>> e4ae605b083ab536c68552e37ca3c46f6bff4c76.
> > > > > > >>>
> > > > > > >>> commit e4ae605b083ab536c68552e37ca3c46f6bff4c76
> > > > > > >>> Author: Fredy Wijaya <fw...@cloudera.com>
> > > > > > >>> Date:   Thu Jul 12 17:01:13 2018 -0700
> > > > > > >>>
> > > > > > >>>     IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
> > > > > > >>>
> > > > > > >>>     This patch removes the use of IMPALA_MINICLUSTER_PROFILE.
> > The
> > > > > code
> > > > > > >>> that
> > > > > > >>>     uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it
> > defaults
> > > to
> > > > > > code
> > > > > > >>> from
> > > > > > >>>     IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having
> too
> > > > many
> > > > > > code
> > > > > > >>>     changes in this patch, there is no code change for the
> > shims.
> > > > The
> > > > > > >>> shims
> > > > > > >>>     for IMPALA_MINICLUSTER_PROFILE=3 automatically become the
> > > > default
> > > > > > >>>     implementation.
> > > > > > >>>
> > > > > > >>>     Testing:
> > > > > > >>>     - Ran core and exhaustive tests
> > > > > > >>>
> > > > > > >>>     Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
> > > > > > >>>     Reviewed-on: http://gerrit.cloudera.org:8080/10940
> > > > > > >>>     Reviewed-by: Impala Public Jenkins <
> > > > > > >>> impala-public-jenkins@cloudera.com>
> > > > > > >>>     Tested-by: Impala Public Jenkins <
> > > > > > impala-public-jenkins@cloudera.com
> > > > > > >>> >
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Move forward branch-2.x

Posted by Bharath Vissapragada <bh...@cloudera.com>.
On Wed, Jan 30, 2019 at 12:21 AM Quanlong Huang <hu...@gmail.com>
wrote:

> I'm afraid the difference between branch-2.x and Cloudera's branch is
> larger than the difference between branch-2.x and master branch. Cloudera's
> branch already ignored lots of commits, which causes the gap. I've tried
> cherry-pick from master or Cloudera's branch and found it's much easier to
> pick from master branch.
> If https://gerrit.cloudera.org/c/12292/ is merged, I can easily pick 40+
> commits into branch-2.x with few conflicts to resolve!! See the first
> column in the sheet:
>
> https://docs.google.com/spreadsheets/d/12h1rTAPS1gm0vhlDGxeOXjnRD7rrOcoqzX4rjRRCyBg
> The result is here:
> https://github.com/stiga-huang/incubator-impala/tree/future-2.x


Sure whatever works best for you. You are right that the Cloudera branch
was selective in cherry-picking stuff. We mostly focussed on "fetch
on-demand metadata" changes and "finer grained privileges" and ignored the
rest. If that is what you are looking for, it is easier to cherry-pick from
Cloudera's branch. Otherwise probably better to replay commits from the
master branch.


>
> We can restart the cherrypick-2.x-and-test Jenkins job. Each time
> there're conflicts, I'll come to resolve it. If the job keeps running, it's
> possible for branch-2.x to catch up the master branch!
>
> Besides, https://gerrit.cloudera.org/c/12292/ is about DESCRIBE behavior
> in
> FGP(Fine-grained privileges). I think it's reasonable and not
> compatibility breaking. Does anyone have more thoughts about this?
>

Hmm, I see what you are saying.  Definitely helps to include it to make
future cherry-picks easier.

Technically it is still a behavioral change, especially if someone upgrades
to a version with this fix and we typically try to avoid that (describe
that worked before doesn't work after upgrade). I can't speak about why we
included it in the Cloudera branch since that was an internal decision but
I don't know if we have any policies here around backporting such stuff
into older branches. Maybe good to know what others think.

Anyway, I don't feel too strongly about this and since it is blocking your
work, I removed my -1 on the code review but this is something to keep in
mind when backporting such patches.


>
> On Wed, Jan 30, 2019 at 9:41 AM Fredy Wijaya <fw...@cloudera.com> wrote:
>
> > Due to the way we build 2.x where it can't use the pinned versions of CDH
> > dependencies, it may be better to cherry-pick all commits in
> > https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0, which also
> > includes that DESCRIBE commit to avoid further integration issues later
> on.
> >
> > On Tue, Jan 29, 2019 at 5:05 PM Quanlong Huang <hu...@gmail.com>
> > wrote:
> >
> > > Hi Bharath,
> > >
> > > Thank you a lot for your notice! However, I've gone through the commits
> > of
> > > cdh branch before and found that this patch is also picked:
> > >
> > >
> >
> https://github.com/cloudera/Impala/commit/f8a318d4f75e22a963b9cf4786ef07d2cd6bd93c
> > > .
> > > Is this really a compatibility breaking change?
> > >
> > > I'm also concern that the TestDescribeTableResults it introduced is too
> > > strictly that may cause troubles. However, I found two later commits
> > > (IMPALA-7143 and IMPALA-7144) would fix this. I'm going to cherry-pick
> > > these two and IMPALA-7676 (thanks Fredy's advise too!) right after
> > > https://gerrit.cloudera.org/c/12292/ is merged.
> > >
> > > Please let me know if this will go astray. Thanks!
> > >
> > >
> > > On Wed, Jan 30, 2019 at 12:36 AM Bharath Vissapragada <
> > > bharathv@cloudera.com>
> > > wrote:
> > >
> > > > On Mon, Jan 28, 2019 at 11:36 PM Quanlong Huang <
> > huangquanlong@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > >>For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> > > > interfaces
> > > > > for Db, View, Table, Partition", the cherry-pick conflicts is due
> to
> > > the
> > > > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
> IMPALA-6479
> > > > being
> > > > > picked back. Does anyone know why we revert it? (I also comment in
> > the
> > > > > JIRA).
> > > > > >
> > > > > >There are test failures. I guess it's the reason. Hopefully,
> > > > > cdh-5.16.1-release already picked up this patch, which provides
> some
> > > > > pointers :)
> > > > >
> > > > > I fix the test failures and create a review at
> > > > > https://gerrit.cloudera.org/c/12292/
> > > > > Waiting for Jenkins maintenance to finish and then run a GVO. Hopes
> > > > someone
> > > > > can join and have a look!
> > > > >
> > > > > On Tue, Jan 29, 2019 at 7:39 AM Quanlong Huang <
> > > huangquanlong@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > >For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> > > > interfaces
> > > > > > for Db, View, Table, Partition", the cherry-pick conflicts is due
> > to
> > > > the
> > > > > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
> > IMPALA-6479
> > > > > being
> > > > > > picked back. Does anyone know why we revert it? (I also comment
> in
> > > the
> > > > > > JIRA).
> > > > >
> > > >
> > > > It was reverted because it is a compatibility breaking change. We
> > > typically
> > > > try not to introduce such behavioral changes in the same major
> version
> > > line
> > > > as that can cause upgrade issues.
> > > >
> > > >
> > > > > >
> > > > > > There are test failures. I guess it's the reason. Hopefully,
> > > > > > cdh-5.16.1-release already picked up this patch, which provides
> > some
> > > > > > pointers :)
> > > > >
> > > >
> > > > I work at Cloudera and we've gone through this exercise before. It is
> > > > annoying to resolve the conflicts, so you can reuse our work and save
> > > some
> > > > time.
> > > > https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0
> > > >
> > > >
> > > > > >
> > > > > > On Mon, Jan 28, 2019 at 10:51 PM Quanlong Huang <
> > > > huangquanlong@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> Yes, there are two discussion threads before that are relative
> to
> > > > this.
> > > > > >> One for stopping the cherrypick-2.x-and-test jenkins job:
> > > > > >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/2b4b62d4c07661b27a5203618cb0425a429f6460f2eb505acbcd26c6@%3Cdev.impala.apache.org%3E
> > > > > >>
> > > > > >> The other for removing support for hadoop 2 in master branch:
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/49f9b68ed3d6d2c0fdee16a877b259922545e4824e1233479227a657@%3Cdev.impala.apache.org%3E
> > > > > >>
> > > > > >> I'm +1 with the second thread that we only support Hadoop 2 in
> > > > > branch-2.x
> > > > > >> and support Hadoop 3 in the master branch to be more focused.
> I'm
> > > also
> > > > > >> agree with Paul's concern. It's such a dilemma that if we skip
> > some
> > > > > >> commits, things will be harder and harder as we moving forward;
> if
> > > we
> > > > > >> cherry-pick, review, and test the commits one by one, branch-2.x
> > > will
> > > > > never
> > > > > >> catch up the master branch, which is an obstacle if someone
> (like
> > > me)
> > > > > wants
> > > > > >> to backport his/her new patch to branch-2.x but waits too long
> and
> > > > > finally
> > > > > >> fogets details of the patch.
> > > > > >>
> > > > > >> I roughly investigated how other systems deal with multiple
> > > branches.
> > > > > The
> > > > > >> efforts to backport a patch could be the same for the original
> > > patch.
> > > > > It's
> > > > > >> not a easy go, so the Hive community declares that
> > > > > >> "The decision to port a feature from master to branch-1 is at
> the
> > > > > >> discretion of the contributor and committer. However no features
> > > that
> > > > > break
> > > > > >> backwards compatibility will be accepted on branch-1."
> > > > > >>
> > > > > >> I think it's a chance to understand more parts of Impala by
> > learning
> > > > and
> > > > > >> backporting the patches, since they have execellent commit
> > messages
> > > > and
> > > > > >> were strictly reviewed. So I volunteer for the job to move
> forward
> > > the
> > > > > >> branch-2.x. Hopes patch authors could give some pointers when
> I'm
> > > > > blocked!
> > > > > >> I'll try approach (b) first and switch to (a) when (b) becomes
> > > > > impossible
> > > > > >> after too many commits are skipped. I'll confirm with the author
> > if
> > > I
> > > > > think
> > > > > >> a patch should be skipped.
> > > > > >>
> > > > > >> For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> > > > interfaces
> > > > > >> for Db, View, Table, Partition", the cherry-pick conflicts is
> due
> > to
> > > > the
> > > > > >> revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
> > > IMPALA-6479
> > > > > being
> > > > > >> picked back. Does anyone know why we revert it? (I also comment
> in
> > > the
> > > > > >> JIRA).
> > > > > >>
> > > > > >> On Mon, Jan 28, 2019 at 12:43 PM Philip Zeyliger <
> > > philip@cloudera.com
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >>> As for Quanlong's question, I think the answer is however the
> > folks
> > > > who
> > > > > >>> want to do the work prefer to do it. As you noticed in the CDH
> > > > > >>> changelists,
> > > > > >>> Cloudera's distribution has opted for something more like
> > approach
> > > > (a),
> > > > > >>> choosing to backport individual features. For a while, we were
> > > doing
> > > > > >>> automation for cherry-picking things automatically, and it got
> > > > tedious
> > > > > >>> enough that we decided to turn it off.
> > > > > >>>
> > > > > >>> On Sun, Jan 27, 2019 at 7:37 PM Paul Rogers <
> > progers@cloudera.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> > Hi Quanlong,
> > > > > >>> >
> > > > > >>> > Thanks for the suggestion. I wonder if there is a third
> > strategy:
> > > > > >>> >
> > > > > >>> > c) Isolate the Hadoop 2.x/3.x differences into
> clearly-defined
> > > > driver
> > > > > >>> > layer so that basically all of 3.x can be applied to the 2.x
> > > > branch.
> > > > > >>> Said
> > > > > >>> > another way, a single source base can work against either
> > Hadoop
> > > > 2.x
> > > > > or
> > > > > >>> > 3.x, with the build (C++) or runtime (Java) choosing the
> proper
> > > > > >>> “driver”
> > > > > >>> > classes.
> > > > > >>> >
> > > > > >>>
> > > > > >>> We had such a layer for a while, where Impala master could be
> > built
> > > > > >>> against
> > > > > >>> either Hadoop3 or Hadoop2. We decided to clean it up in commit
> > > > > >>> e4ae605b083ab536c68552e37ca3c46f6bff4c76.
> > > > > >>>
> > > > > >>> commit e4ae605b083ab536c68552e37ca3c46f6bff4c76
> > > > > >>> Author: Fredy Wijaya <fw...@cloudera.com>
> > > > > >>> Date:   Thu Jul 12 17:01:13 2018 -0700
> > > > > >>>
> > > > > >>>     IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
> > > > > >>>
> > > > > >>>     This patch removes the use of IMPALA_MINICLUSTER_PROFILE.
> The
> > > > code
> > > > > >>> that
> > > > > >>>     uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it
> defaults
> > to
> > > > > code
> > > > > >>> from
> > > > > >>>     IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too
> > > many
> > > > > code
> > > > > >>>     changes in this patch, there is no code change for the
> shims.
> > > The
> > > > > >>> shims
> > > > > >>>     for IMPALA_MINICLUSTER_PROFILE=3 automatically become the
> > > default
> > > > > >>>     implementation.
> > > > > >>>
> > > > > >>>     Testing:
> > > > > >>>     - Ran core and exhaustive tests
> > > > > >>>
> > > > > >>>     Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
> > > > > >>>     Reviewed-on: http://gerrit.cloudera.org:8080/10940
> > > > > >>>     Reviewed-by: Impala Public Jenkins <
> > > > > >>> impala-public-jenkins@cloudera.com>
> > > > > >>>     Tested-by: Impala Public Jenkins <
> > > > > impala-public-jenkins@cloudera.com
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Move forward branch-2.x

Posted by Quanlong Huang <hu...@gmail.com>.
I'm afraid the difference between branch-2.x and Cloudera's branch is
larger than the difference between branch-2.x and master branch. Cloudera's
branch already ignored lots of commits, which causes the gap. I've tried
cherry-pick from master or Cloudera's branch and found it's much easier to
pick from master branch.
If https://gerrit.cloudera.org/c/12292/ is merged, I can easily pick 40+
commits into branch-2.x with few conflicts to resolve!! See the first
column in the sheet:
https://docs.google.com/spreadsheets/d/12h1rTAPS1gm0vhlDGxeOXjnRD7rrOcoqzX4rjRRCyBg
The result is here:
https://github.com/stiga-huang/incubator-impala/tree/future-2.x

We can restart the cherrypick-2.x-and-test Jenkins job. Each time
there're conflicts, I'll come to resolve it. If the job keeps running, it's
possible for branch-2.x to catch up the master branch!

Besides, https://gerrit.cloudera.org/c/12292/ is about DESCRIBE behavior in
FGP(Fine-grained privileges). I think it's reasonable and not
compatibility breaking. Does anyone have more thoughts about this?

On Wed, Jan 30, 2019 at 9:41 AM Fredy Wijaya <fw...@cloudera.com> wrote:

> Due to the way we build 2.x where it can't use the pinned versions of CDH
> dependencies, it may be better to cherry-pick all commits in
> https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0, which also
> includes that DESCRIBE commit to avoid further integration issues later on.
>
> On Tue, Jan 29, 2019 at 5:05 PM Quanlong Huang <hu...@gmail.com>
> wrote:
>
> > Hi Bharath,
> >
> > Thank you a lot for your notice! However, I've gone through the commits
> of
> > cdh branch before and found that this patch is also picked:
> >
> >
> https://github.com/cloudera/Impala/commit/f8a318d4f75e22a963b9cf4786ef07d2cd6bd93c
> > .
> > Is this really a compatibility breaking change?
> >
> > I'm also concern that the TestDescribeTableResults it introduced is too
> > strictly that may cause troubles. However, I found two later commits
> > (IMPALA-7143 and IMPALA-7144) would fix this. I'm going to cherry-pick
> > these two and IMPALA-7676 (thanks Fredy's advise too!) right after
> > https://gerrit.cloudera.org/c/12292/ is merged.
> >
> > Please let me know if this will go astray. Thanks!
> >
> >
> > On Wed, Jan 30, 2019 at 12:36 AM Bharath Vissapragada <
> > bharathv@cloudera.com>
> > wrote:
> >
> > > On Mon, Jan 28, 2019 at 11:36 PM Quanlong Huang <
> huangquanlong@gmail.com
> > >
> > > wrote:
> > >
> > > > >>For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> > > interfaces
> > > > for Db, View, Table, Partition", the cherry-pick conflicts is due to
> > the
> > > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479
> > > being
> > > > picked back. Does anyone know why we revert it? (I also comment in
> the
> > > > JIRA).
> > > > >
> > > > >There are test failures. I guess it's the reason. Hopefully,
> > > > cdh-5.16.1-release already picked up this patch, which provides some
> > > > pointers :)
> > > >
> > > > I fix the test failures and create a review at
> > > > https://gerrit.cloudera.org/c/12292/
> > > > Waiting for Jenkins maintenance to finish and then run a GVO. Hopes
> > > someone
> > > > can join and have a look!
> > > >
> > > > On Tue, Jan 29, 2019 at 7:39 AM Quanlong Huang <
> > huangquanlong@gmail.com>
> > > > wrote:
> > > >
> > > > > >For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> > > interfaces
> > > > > for Db, View, Table, Partition", the cherry-pick conflicts is due
> to
> > > the
> > > > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
> IMPALA-6479
> > > > being
> > > > > picked back. Does anyone know why we revert it? (I also comment in
> > the
> > > > > JIRA).
> > > >
> > >
> > > It was reverted because it is a compatibility breaking change. We
> > typically
> > > try not to introduce such behavioral changes in the same major version
> > line
> > > as that can cause upgrade issues.
> > >
> > >
> > > > >
> > > > > There are test failures. I guess it's the reason. Hopefully,
> > > > > cdh-5.16.1-release already picked up this patch, which provides
> some
> > > > > pointers :)
> > > >
> > >
> > > I work at Cloudera and we've gone through this exercise before. It is
> > > annoying to resolve the conflicts, so you can reuse our work and save
> > some
> > > time.
> > > https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0
> > >
> > >
> > > > >
> > > > > On Mon, Jan 28, 2019 at 10:51 PM Quanlong Huang <
> > > huangquanlong@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > >> Yes, there are two discussion threads before that are relative to
> > > this.
> > > > >> One for stopping the cherrypick-2.x-and-test jenkins job:
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/2b4b62d4c07661b27a5203618cb0425a429f6460f2eb505acbcd26c6@%3Cdev.impala.apache.org%3E
> > > > >>
> > > > >> The other for removing support for hadoop 2 in master branch:
> > > > >>
> > > >
> > >
> >
> https://lists.apache.org/thread.html/49f9b68ed3d6d2c0fdee16a877b259922545e4824e1233479227a657@%3Cdev.impala.apache.org%3E
> > > > >>
> > > > >> I'm +1 with the second thread that we only support Hadoop 2 in
> > > > branch-2.x
> > > > >> and support Hadoop 3 in the master branch to be more focused. I'm
> > also
> > > > >> agree with Paul's concern. It's such a dilemma that if we skip
> some
> > > > >> commits, things will be harder and harder as we moving forward; if
> > we
> > > > >> cherry-pick, review, and test the commits one by one, branch-2.x
> > will
> > > > never
> > > > >> catch up the master branch, which is an obstacle if someone (like
> > me)
> > > > wants
> > > > >> to backport his/her new patch to branch-2.x but waits too long and
> > > > finally
> > > > >> fogets details of the patch.
> > > > >>
> > > > >> I roughly investigated how other systems deal with multiple
> > branches.
> > > > The
> > > > >> efforts to backport a patch could be the same for the original
> > patch.
> > > > It's
> > > > >> not a easy go, so the Hive community declares that
> > > > >> "The decision to port a feature from master to branch-1 is at the
> > > > >> discretion of the contributor and committer. However no features
> > that
> > > > break
> > > > >> backwards compatibility will be accepted on branch-1."
> > > > >>
> > > > >> I think it's a chance to understand more parts of Impala by
> learning
> > > and
> > > > >> backporting the patches, since they have execellent commit
> messages
> > > and
> > > > >> were strictly reviewed. So I volunteer for the job to move forward
> > the
> > > > >> branch-2.x. Hopes patch authors could give some pointers when I'm
> > > > blocked!
> > > > >> I'll try approach (b) first and switch to (a) when (b) becomes
> > > > impossible
> > > > >> after too many commits are skipped. I'll confirm with the author
> if
> > I
> > > > think
> > > > >> a patch should be skipped.
> > > > >>
> > > > >> For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> > > interfaces
> > > > >> for Db, View, Table, Partition", the cherry-pick conflicts is due
> to
> > > the
> > > > >> revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
> > IMPALA-6479
> > > > being
> > > > >> picked back. Does anyone know why we revert it? (I also comment in
> > the
> > > > >> JIRA).
> > > > >>
> > > > >> On Mon, Jan 28, 2019 at 12:43 PM Philip Zeyliger <
> > philip@cloudera.com
> > > >
> > > > >> wrote:
> > > > >>
> > > > >>> As for Quanlong's question, I think the answer is however the
> folks
> > > who
> > > > >>> want to do the work prefer to do it. As you noticed in the CDH
> > > > >>> changelists,
> > > > >>> Cloudera's distribution has opted for something more like
> approach
> > > (a),
> > > > >>> choosing to backport individual features. For a while, we were
> > doing
> > > > >>> automation for cherry-picking things automatically, and it got
> > > tedious
> > > > >>> enough that we decided to turn it off.
> > > > >>>
> > > > >>> On Sun, Jan 27, 2019 at 7:37 PM Paul Rogers <
> progers@cloudera.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>> > Hi Quanlong,
> > > > >>> >
> > > > >>> > Thanks for the suggestion. I wonder if there is a third
> strategy:
> > > > >>> >
> > > > >>> > c) Isolate the Hadoop 2.x/3.x differences into clearly-defined
> > > driver
> > > > >>> > layer so that basically all of 3.x can be applied to the 2.x
> > > branch.
> > > > >>> Said
> > > > >>> > another way, a single source base can work against either
> Hadoop
> > > 2.x
> > > > or
> > > > >>> > 3.x, with the build (C++) or runtime (Java) choosing the proper
> > > > >>> “driver”
> > > > >>> > classes.
> > > > >>> >
> > > > >>>
> > > > >>> We had such a layer for a while, where Impala master could be
> built
> > > > >>> against
> > > > >>> either Hadoop3 or Hadoop2. We decided to clean it up in commit
> > > > >>> e4ae605b083ab536c68552e37ca3c46f6bff4c76.
> > > > >>>
> > > > >>> commit e4ae605b083ab536c68552e37ca3c46f6bff4c76
> > > > >>> Author: Fredy Wijaya <fw...@cloudera.com>
> > > > >>> Date:   Thu Jul 12 17:01:13 2018 -0700
> > > > >>>
> > > > >>>     IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
> > > > >>>
> > > > >>>     This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The
> > > code
> > > > >>> that
> > > > >>>     uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults
> to
> > > > code
> > > > >>> from
> > > > >>>     IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too
> > many
> > > > code
> > > > >>>     changes in this patch, there is no code change for the shims.
> > The
> > > > >>> shims
> > > > >>>     for IMPALA_MINICLUSTER_PROFILE=3 automatically become the
> > default
> > > > >>>     implementation.
> > > > >>>
> > > > >>>     Testing:
> > > > >>>     - Ran core and exhaustive tests
> > > > >>>
> > > > >>>     Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
> > > > >>>     Reviewed-on: http://gerrit.cloudera.org:8080/10940
> > > > >>>     Reviewed-by: Impala Public Jenkins <
> > > > >>> impala-public-jenkins@cloudera.com>
> > > > >>>     Tested-by: Impala Public Jenkins <
> > > > impala-public-jenkins@cloudera.com
> > > > >>> >
> > > > >>>
> > > > >>
> > > >
> > >
> >
>

Re: Move forward branch-2.x

Posted by Fredy Wijaya <fw...@cloudera.com>.
Due to the way we build 2.x where it can't use the pinned versions of CDH
dependencies, it may be better to cherry-pick all commits in
https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0, which also
includes that DESCRIBE commit to avoid further integration issues later on.

On Tue, Jan 29, 2019 at 5:05 PM Quanlong Huang <hu...@gmail.com>
wrote:

> Hi Bharath,
>
> Thank you a lot for your notice! However, I've gone through the commits of
> cdh branch before and found that this patch is also picked:
>
> https://github.com/cloudera/Impala/commit/f8a318d4f75e22a963b9cf4786ef07d2cd6bd93c
> .
> Is this really a compatibility breaking change?
>
> I'm also concern that the TestDescribeTableResults it introduced is too
> strictly that may cause troubles. However, I found two later commits
> (IMPALA-7143 and IMPALA-7144) would fix this. I'm going to cherry-pick
> these two and IMPALA-7676 (thanks Fredy's advise too!) right after
> https://gerrit.cloudera.org/c/12292/ is merged.
>
> Please let me know if this will go astray. Thanks!
>
>
> On Wed, Jan 30, 2019 at 12:36 AM Bharath Vissapragada <
> bharathv@cloudera.com>
> wrote:
>
> > On Mon, Jan 28, 2019 at 11:36 PM Quanlong Huang <huangquanlong@gmail.com
> >
> > wrote:
> >
> > > >>For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> > interfaces
> > > for Db, View, Table, Partition", the cherry-pick conflicts is due to
> the
> > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479
> > being
> > > picked back. Does anyone know why we revert it? (I also comment in the
> > > JIRA).
> > > >
> > > >There are test failures. I guess it's the reason. Hopefully,
> > > cdh-5.16.1-release already picked up this patch, which provides some
> > > pointers :)
> > >
> > > I fix the test failures and create a review at
> > > https://gerrit.cloudera.org/c/12292/
> > > Waiting for Jenkins maintenance to finish and then run a GVO. Hopes
> > someone
> > > can join and have a look!
> > >
> > > On Tue, Jan 29, 2019 at 7:39 AM Quanlong Huang <
> huangquanlong@gmail.com>
> > > wrote:
> > >
> > > > >For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> > interfaces
> > > > for Db, View, Table, Partition", the cherry-pick conflicts is due to
> > the
> > > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479
> > > being
> > > > picked back. Does anyone know why we revert it? (I also comment in
> the
> > > > JIRA).
> > >
> >
> > It was reverted because it is a compatibility breaking change. We
> typically
> > try not to introduce such behavioral changes in the same major version
> line
> > as that can cause upgrade issues.
> >
> >
> > > >
> > > > There are test failures. I guess it's the reason. Hopefully,
> > > > cdh-5.16.1-release already picked up this patch, which provides some
> > > > pointers :)
> > >
> >
> > I work at Cloudera and we've gone through this exercise before. It is
> > annoying to resolve the conflicts, so you can reuse our work and save
> some
> > time.
> > https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0
> >
> >
> > > >
> > > > On Mon, Jan 28, 2019 at 10:51 PM Quanlong Huang <
> > huangquanlong@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> Yes, there are two discussion threads before that are relative to
> > this.
> > > >> One for stopping the cherrypick-2.x-and-test jenkins job:
> > > >>
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/2b4b62d4c07661b27a5203618cb0425a429f6460f2eb505acbcd26c6@%3Cdev.impala.apache.org%3E
> > > >>
> > > >> The other for removing support for hadoop 2 in master branch:
> > > >>
> > >
> >
> https://lists.apache.org/thread.html/49f9b68ed3d6d2c0fdee16a877b259922545e4824e1233479227a657@%3Cdev.impala.apache.org%3E
> > > >>
> > > >> I'm +1 with the second thread that we only support Hadoop 2 in
> > > branch-2.x
> > > >> and support Hadoop 3 in the master branch to be more focused. I'm
> also
> > > >> agree with Paul's concern. It's such a dilemma that if we skip some
> > > >> commits, things will be harder and harder as we moving forward; if
> we
> > > >> cherry-pick, review, and test the commits one by one, branch-2.x
> will
> > > never
> > > >> catch up the master branch, which is an obstacle if someone (like
> me)
> > > wants
> > > >> to backport his/her new patch to branch-2.x but waits too long and
> > > finally
> > > >> fogets details of the patch.
> > > >>
> > > >> I roughly investigated how other systems deal with multiple
> branches.
> > > The
> > > >> efforts to backport a patch could be the same for the original
> patch.
> > > It's
> > > >> not a easy go, so the Hive community declares that
> > > >> "The decision to port a feature from master to branch-1 is at the
> > > >> discretion of the contributor and committer. However no features
> that
> > > break
> > > >> backwards compatibility will be accepted on branch-1."
> > > >>
> > > >> I think it's a chance to understand more parts of Impala by learning
> > and
> > > >> backporting the patches, since they have execellent commit messages
> > and
> > > >> were strictly reviewed. So I volunteer for the job to move forward
> the
> > > >> branch-2.x. Hopes patch authors could give some pointers when I'm
> > > blocked!
> > > >> I'll try approach (b) first and switch to (a) when (b) becomes
> > > impossible
> > > >> after too many commits are skipped. I'll confirm with the author if
> I
> > > think
> > > >> a patch should be skipped.
> > > >>
> > > >> For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> > interfaces
> > > >> for Db, View, Table, Partition", the cherry-pick conflicts is due to
> > the
> > > >> revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with
> IMPALA-6479
> > > being
> > > >> picked back. Does anyone know why we revert it? (I also comment in
> the
> > > >> JIRA).
> > > >>
> > > >> On Mon, Jan 28, 2019 at 12:43 PM Philip Zeyliger <
> philip@cloudera.com
> > >
> > > >> wrote:
> > > >>
> > > >>> As for Quanlong's question, I think the answer is however the folks
> > who
> > > >>> want to do the work prefer to do it. As you noticed in the CDH
> > > >>> changelists,
> > > >>> Cloudera's distribution has opted for something more like approach
> > (a),
> > > >>> choosing to backport individual features. For a while, we were
> doing
> > > >>> automation for cherry-picking things automatically, and it got
> > tedious
> > > >>> enough that we decided to turn it off.
> > > >>>
> > > >>> On Sun, Jan 27, 2019 at 7:37 PM Paul Rogers <pr...@cloudera.com>
> > > >>> wrote:
> > > >>>
> > > >>> > Hi Quanlong,
> > > >>> >
> > > >>> > Thanks for the suggestion. I wonder if there is a third strategy:
> > > >>> >
> > > >>> > c) Isolate the Hadoop 2.x/3.x differences into clearly-defined
> > driver
> > > >>> > layer so that basically all of 3.x can be applied to the 2.x
> > branch.
> > > >>> Said
> > > >>> > another way, a single source base can work against either Hadoop
> > 2.x
> > > or
> > > >>> > 3.x, with the build (C++) or runtime (Java) choosing the proper
> > > >>> “driver”
> > > >>> > classes.
> > > >>> >
> > > >>>
> > > >>> We had such a layer for a while, where Impala master could be built
> > > >>> against
> > > >>> either Hadoop3 or Hadoop2. We decided to clean it up in commit
> > > >>> e4ae605b083ab536c68552e37ca3c46f6bff4c76.
> > > >>>
> > > >>> commit e4ae605b083ab536c68552e37ca3c46f6bff4c76
> > > >>> Author: Fredy Wijaya <fw...@cloudera.com>
> > > >>> Date:   Thu Jul 12 17:01:13 2018 -0700
> > > >>>
> > > >>>     IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
> > > >>>
> > > >>>     This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The
> > code
> > > >>> that
> > > >>>     uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to
> > > code
> > > >>> from
> > > >>>     IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too
> many
> > > code
> > > >>>     changes in this patch, there is no code change for the shims.
> The
> > > >>> shims
> > > >>>     for IMPALA_MINICLUSTER_PROFILE=3 automatically become the
> default
> > > >>>     implementation.
> > > >>>
> > > >>>     Testing:
> > > >>>     - Ran core and exhaustive tests
> > > >>>
> > > >>>     Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
> > > >>>     Reviewed-on: http://gerrit.cloudera.org:8080/10940
> > > >>>     Reviewed-by: Impala Public Jenkins <
> > > >>> impala-public-jenkins@cloudera.com>
> > > >>>     Tested-by: Impala Public Jenkins <
> > > impala-public-jenkins@cloudera.com
> > > >>> >
> > > >>>
> > > >>
> > >
> >
>

Re: Move forward branch-2.x

Posted by Quanlong Huang <hu...@gmail.com>.
Hi Bharath,

Thank you a lot for your notice! However, I've gone through the commits of
cdh branch before and found that this patch is also picked:
https://github.com/cloudera/Impala/commit/f8a318d4f75e22a963b9cf4786ef07d2cd6bd93c
.
Is this really a compatibility breaking change?

I'm also concern that the TestDescribeTableResults it introduced is too
strictly that may cause troubles. However, I found two later commits
(IMPALA-7143 and IMPALA-7144) would fix this. I'm going to cherry-pick
these two and IMPALA-7676 (thanks Fredy's advise too!) right after
https://gerrit.cloudera.org/c/12292/ is merged.

Please let me know if this will go astray. Thanks!


On Wed, Jan 30, 2019 at 12:36 AM Bharath Vissapragada <bh...@cloudera.com>
wrote:

> On Mon, Jan 28, 2019 at 11:36 PM Quanlong Huang <hu...@gmail.com>
> wrote:
>
> > >>For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> interfaces
> > for Db, View, Table, Partition", the cherry-pick conflicts is due to the
> > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479
> being
> > picked back. Does anyone know why we revert it? (I also comment in the
> > JIRA).
> > >
> > >There are test failures. I guess it's the reason. Hopefully,
> > cdh-5.16.1-release already picked up this patch, which provides some
> > pointers :)
> >
> > I fix the test failures and create a review at
> > https://gerrit.cloudera.org/c/12292/
> > Waiting for Jenkins maintenance to finish and then run a GVO. Hopes
> someone
> > can join and have a look!
> >
> > On Tue, Jan 29, 2019 at 7:39 AM Quanlong Huang <hu...@gmail.com>
> > wrote:
> >
> > > >For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> interfaces
> > > for Db, View, Table, Partition", the cherry-pick conflicts is due to
> the
> > > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479
> > being
> > > picked back. Does anyone know why we revert it? (I also comment in the
> > > JIRA).
> >
>
> It was reverted because it is a compatibility breaking change. We typically
> try not to introduce such behavioral changes in the same major version line
> as that can cause upgrade issues.
>
>
> > >
> > > There are test failures. I guess it's the reason. Hopefully,
> > > cdh-5.16.1-release already picked up this patch, which provides some
> > > pointers :)
> >
>
> I work at Cloudera and we've gone through this exercise before. It is
> annoying to resolve the conflicts, so you can reuse our work and save some
> time.
> https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0
>
>
> > >
> > > On Mon, Jan 28, 2019 at 10:51 PM Quanlong Huang <
> huangquanlong@gmail.com
> > >
> > > wrote:
> > >
> > >> Yes, there are two discussion threads before that are relative to
> this.
> > >> One for stopping the cherrypick-2.x-and-test jenkins job:
> > >>
> > >>
> >
> https://lists.apache.org/thread.html/2b4b62d4c07661b27a5203618cb0425a429f6460f2eb505acbcd26c6@%3Cdev.impala.apache.org%3E
> > >>
> > >> The other for removing support for hadoop 2 in master branch:
> > >>
> >
> https://lists.apache.org/thread.html/49f9b68ed3d6d2c0fdee16a877b259922545e4824e1233479227a657@%3Cdev.impala.apache.org%3E
> > >>
> > >> I'm +1 with the second thread that we only support Hadoop 2 in
> > branch-2.x
> > >> and support Hadoop 3 in the master branch to be more focused. I'm also
> > >> agree with Paul's concern. It's such a dilemma that if we skip some
> > >> commits, things will be harder and harder as we moving forward; if we
> > >> cherry-pick, review, and test the commits one by one, branch-2.x will
> > never
> > >> catch up the master branch, which is an obstacle if someone (like me)
> > wants
> > >> to backport his/her new patch to branch-2.x but waits too long and
> > finally
> > >> fogets details of the patch.
> > >>
> > >> I roughly investigated how other systems deal with multiple branches.
> > The
> > >> efforts to backport a patch could be the same for the original patch.
> > It's
> > >> not a easy go, so the Hive community declares that
> > >> "The decision to port a feature from master to branch-1 is at the
> > >> discretion of the contributor and committer. However no features that
> > break
> > >> backwards compatibility will be accepted on branch-1."
> > >>
> > >> I think it's a chance to understand more parts of Impala by learning
> and
> > >> backporting the patches, since they have execellent commit messages
> and
> > >> were strictly reviewed. So I volunteer for the job to move forward the
> > >> branch-2.x. Hopes patch authors could give some pointers when I'm
> > blocked!
> > >> I'll try approach (b) first and switch to (a) when (b) becomes
> > impossible
> > >> after too many commits are skipped. I'll confirm with the author if I
> > think
> > >> a patch should be skipped.
> > >>
> > >> For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor
> interfaces
> > >> for Db, View, Table, Partition", the cherry-pick conflicts is due to
> the
> > >> revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479
> > being
> > >> picked back. Does anyone know why we revert it? (I also comment in the
> > >> JIRA).
> > >>
> > >> On Mon, Jan 28, 2019 at 12:43 PM Philip Zeyliger <philip@cloudera.com
> >
> > >> wrote:
> > >>
> > >>> As for Quanlong's question, I think the answer is however the folks
> who
> > >>> want to do the work prefer to do it. As you noticed in the CDH
> > >>> changelists,
> > >>> Cloudera's distribution has opted for something more like approach
> (a),
> > >>> choosing to backport individual features. For a while, we were doing
> > >>> automation for cherry-picking things automatically, and it got
> tedious
> > >>> enough that we decided to turn it off.
> > >>>
> > >>> On Sun, Jan 27, 2019 at 7:37 PM Paul Rogers <pr...@cloudera.com>
> > >>> wrote:
> > >>>
> > >>> > Hi Quanlong,
> > >>> >
> > >>> > Thanks for the suggestion. I wonder if there is a third strategy:
> > >>> >
> > >>> > c) Isolate the Hadoop 2.x/3.x differences into clearly-defined
> driver
> > >>> > layer so that basically all of 3.x can be applied to the 2.x
> branch.
> > >>> Said
> > >>> > another way, a single source base can work against either Hadoop
> 2.x
> > or
> > >>> > 3.x, with the build (C++) or runtime (Java) choosing the proper
> > >>> “driver”
> > >>> > classes.
> > >>> >
> > >>>
> > >>> We had such a layer for a while, where Impala master could be built
> > >>> against
> > >>> either Hadoop3 or Hadoop2. We decided to clean it up in commit
> > >>> e4ae605b083ab536c68552e37ca3c46f6bff4c76.
> > >>>
> > >>> commit e4ae605b083ab536c68552e37ca3c46f6bff4c76
> > >>> Author: Fredy Wijaya <fw...@cloudera.com>
> > >>> Date:   Thu Jul 12 17:01:13 2018 -0700
> > >>>
> > >>>     IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
> > >>>
> > >>>     This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The
> code
> > >>> that
> > >>>     uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to
> > code
> > >>> from
> > >>>     IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many
> > code
> > >>>     changes in this patch, there is no code change for the shims. The
> > >>> shims
> > >>>     for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default
> > >>>     implementation.
> > >>>
> > >>>     Testing:
> > >>>     - Ran core and exhaustive tests
> > >>>
> > >>>     Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
> > >>>     Reviewed-on: http://gerrit.cloudera.org:8080/10940
> > >>>     Reviewed-by: Impala Public Jenkins <
> > >>> impala-public-jenkins@cloudera.com>
> > >>>     Tested-by: Impala Public Jenkins <
> > impala-public-jenkins@cloudera.com
> > >>> >
> > >>>
> > >>
> >
>

Re: Move forward branch-2.x

Posted by Bharath Vissapragada <bh...@cloudera.com>.
On Mon, Jan 28, 2019 at 11:36 PM Quanlong Huang <hu...@gmail.com>
wrote:

> >>For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor interfaces
> for Db, View, Table, Partition", the cherry-pick conflicts is due to the
> revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479 being
> picked back. Does anyone know why we revert it? (I also comment in the
> JIRA).
> >
> >There are test failures. I guess it's the reason. Hopefully,
> cdh-5.16.1-release already picked up this patch, which provides some
> pointers :)
>
> I fix the test failures and create a review at
> https://gerrit.cloudera.org/c/12292/
> Waiting for Jenkins maintenance to finish and then run a GVO. Hopes someone
> can join and have a look!
>
> On Tue, Jan 29, 2019 at 7:39 AM Quanlong Huang <hu...@gmail.com>
> wrote:
>
> > >For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor interfaces
> > for Db, View, Table, Partition", the cherry-pick conflicts is due to the
> > revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479
> being
> > picked back. Does anyone know why we revert it? (I also comment in the
> > JIRA).
>

It was reverted because it is a compatibility breaking change. We typically
try not to introduce such behavioral changes in the same major version line
as that can cause upgrade issues.


> >
> > There are test failures. I guess it's the reason. Hopefully,
> > cdh-5.16.1-release already picked up this patch, which provides some
> > pointers :)
>

I work at Cloudera and we've gone through this exercise before. It is
annoying to resolve the conflicts, so you can reuse our work and save some
time.
https://github.com/cloudera/Impala/tree/cdh5-2.12.0_5.16.0


> >
> > On Mon, Jan 28, 2019 at 10:51 PM Quanlong Huang <huangquanlong@gmail.com
> >
> > wrote:
> >
> >> Yes, there are two discussion threads before that are relative to this.
> >> One for stopping the cherrypick-2.x-and-test jenkins job:
> >>
> >>
> https://lists.apache.org/thread.html/2b4b62d4c07661b27a5203618cb0425a429f6460f2eb505acbcd26c6@%3Cdev.impala.apache.org%3E
> >>
> >> The other for removing support for hadoop 2 in master branch:
> >>
> https://lists.apache.org/thread.html/49f9b68ed3d6d2c0fdee16a877b259922545e4824e1233479227a657@%3Cdev.impala.apache.org%3E
> >>
> >> I'm +1 with the second thread that we only support Hadoop 2 in
> branch-2.x
> >> and support Hadoop 3 in the master branch to be more focused. I'm also
> >> agree with Paul's concern. It's such a dilemma that if we skip some
> >> commits, things will be harder and harder as we moving forward; if we
> >> cherry-pick, review, and test the commits one by one, branch-2.x will
> never
> >> catch up the master branch, which is an obstacle if someone (like me)
> wants
> >> to backport his/her new patch to branch-2.x but waits too long and
> finally
> >> fogets details of the patch.
> >>
> >> I roughly investigated how other systems deal with multiple branches.
> The
> >> efforts to backport a patch could be the same for the original patch.
> It's
> >> not a easy go, so the Hive community declares that
> >> "The decision to port a feature from master to branch-1 is at the
> >> discretion of the contributor and committer. However no features that
> break
> >> backwards compatibility will be accepted on branch-1."
> >>
> >> I think it's a chance to understand more parts of Impala by learning and
> >> backporting the patches, since they have execellent commit messages and
> >> were strictly reviewed. So I volunteer for the job to move forward the
> >> branch-2.x. Hopes patch authors could give some pointers when I'm
> blocked!
> >> I'll try approach (b) first and switch to (a) when (b) becomes
> impossible
> >> after too many commits are skipped. I'll confirm with the author if I
> think
> >> a patch should be skipped.
> >>
> >> For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor interfaces
> >> for Db, View, Table, Partition", the cherry-pick conflicts is due to the
> >> revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479
> being
> >> picked back. Does anyone know why we revert it? (I also comment in the
> >> JIRA).
> >>
> >> On Mon, Jan 28, 2019 at 12:43 PM Philip Zeyliger <ph...@cloudera.com>
> >> wrote:
> >>
> >>> As for Quanlong's question, I think the answer is however the folks who
> >>> want to do the work prefer to do it. As you noticed in the CDH
> >>> changelists,
> >>> Cloudera's distribution has opted for something more like approach (a),
> >>> choosing to backport individual features. For a while, we were doing
> >>> automation for cherry-picking things automatically, and it got tedious
> >>> enough that we decided to turn it off.
> >>>
> >>> On Sun, Jan 27, 2019 at 7:37 PM Paul Rogers <pr...@cloudera.com>
> >>> wrote:
> >>>
> >>> > Hi Quanlong,
> >>> >
> >>> > Thanks for the suggestion. I wonder if there is a third strategy:
> >>> >
> >>> > c) Isolate the Hadoop 2.x/3.x differences into clearly-defined driver
> >>> > layer so that basically all of 3.x can be applied to the 2.x branch.
> >>> Said
> >>> > another way, a single source base can work against either Hadoop 2.x
> or
> >>> > 3.x, with the build (C++) or runtime (Java) choosing the proper
> >>> “driver”
> >>> > classes.
> >>> >
> >>>
> >>> We had such a layer for a while, where Impala master could be built
> >>> against
> >>> either Hadoop3 or Hadoop2. We decided to clean it up in commit
> >>> e4ae605b083ab536c68552e37ca3c46f6bff4c76.
> >>>
> >>> commit e4ae605b083ab536c68552e37ca3c46f6bff4c76
> >>> Author: Fredy Wijaya <fw...@cloudera.com>
> >>> Date:   Thu Jul 12 17:01:13 2018 -0700
> >>>
> >>>     IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
> >>>
> >>>     This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code
> >>> that
> >>>     uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to
> code
> >>> from
> >>>     IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many
> code
> >>>     changes in this patch, there is no code change for the shims. The
> >>> shims
> >>>     for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default
> >>>     implementation.
> >>>
> >>>     Testing:
> >>>     - Ran core and exhaustive tests
> >>>
> >>>     Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
> >>>     Reviewed-on: http://gerrit.cloudera.org:8080/10940
> >>>     Reviewed-by: Impala Public Jenkins <
> >>> impala-public-jenkins@cloudera.com>
> >>>     Tested-by: Impala Public Jenkins <
> impala-public-jenkins@cloudera.com
> >>> >
> >>>
> >>
>

Re: Move forward branch-2.x

Posted by Quanlong Huang <hu...@gmail.com>.
>>For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor interfaces
for Db, View, Table, Partition", the cherry-pick conflicts is due to the
revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479 being
picked back. Does anyone know why we revert it? (I also comment in the
JIRA).
>
>There are test failures. I guess it's the reason. Hopefully,
cdh-5.16.1-release already picked up this patch, which provides some
pointers :)

I fix the test failures and create a review at
https://gerrit.cloudera.org/c/12292/
Waiting for Jenkins maintenance to finish and then run a GVO. Hopes someone
can join and have a look!

On Tue, Jan 29, 2019 at 7:39 AM Quanlong Huang <hu...@gmail.com>
wrote:

> >For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor interfaces
> for Db, View, Table, Partition", the cherry-pick conflicts is due to the
> revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479 being
> picked back. Does anyone know why we revert it? (I also comment in the
> JIRA).
>
> There are test failures. I guess it's the reason. Hopefully,
> cdh-5.16.1-release already picked up this patch, which provides some
> pointers :)
>
> On Mon, Jan 28, 2019 at 10:51 PM Quanlong Huang <hu...@gmail.com>
> wrote:
>
>> Yes, there are two discussion threads before that are relative to this.
>> One for stopping the cherrypick-2.x-and-test jenkins job:
>>
>> https://lists.apache.org/thread.html/2b4b62d4c07661b27a5203618cb0425a429f6460f2eb505acbcd26c6@%3Cdev.impala.apache.org%3E
>>
>> The other for removing support for hadoop 2 in master branch:
>> https://lists.apache.org/thread.html/49f9b68ed3d6d2c0fdee16a877b259922545e4824e1233479227a657@%3Cdev.impala.apache.org%3E
>>
>> I'm +1 with the second thread that we only support Hadoop 2 in branch-2.x
>> and support Hadoop 3 in the master branch to be more focused. I'm also
>> agree with Paul's concern. It's such a dilemma that if we skip some
>> commits, things will be harder and harder as we moving forward; if we
>> cherry-pick, review, and test the commits one by one, branch-2.x will never
>> catch up the master branch, which is an obstacle if someone (like me) wants
>> to backport his/her new patch to branch-2.x but waits too long and finally
>> fogets details of the patch.
>>
>> I roughly investigated how other systems deal with multiple branches. The
>> efforts to backport a patch could be the same for the original patch. It's
>> not a easy go, so the Hive community declares that
>> "The decision to port a feature from master to branch-1 is at the
>> discretion of the contributor and committer. However no features that break
>> backwards compatibility will be accepted on branch-1."
>>
>> I think it's a chance to understand more parts of Impala by learning and
>> backporting the patches, since they have execellent commit messages and
>> were strictly reviewed. So I volunteer for the job to move forward the
>> branch-2.x. Hopes patch authors could give some pointers when I'm blocked!
>> I'll try approach (b) first and switch to (a) when (b) becomes impossible
>> after too many commits are skipped. I'll confirm with the author if I think
>> a patch should be skipped.
>>
>> For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor interfaces
>> for Db, View, Table, Partition", the cherry-pick conflicts is due to the
>> revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479 being
>> picked back. Does anyone know why we revert it? (I also comment in the
>> JIRA).
>>
>> On Mon, Jan 28, 2019 at 12:43 PM Philip Zeyliger <ph...@cloudera.com>
>> wrote:
>>
>>> As for Quanlong's question, I think the answer is however the folks who
>>> want to do the work prefer to do it. As you noticed in the CDH
>>> changelists,
>>> Cloudera's distribution has opted for something more like approach (a),
>>> choosing to backport individual features. For a while, we were doing
>>> automation for cherry-picking things automatically, and it got tedious
>>> enough that we decided to turn it off.
>>>
>>> On Sun, Jan 27, 2019 at 7:37 PM Paul Rogers <pr...@cloudera.com>
>>> wrote:
>>>
>>> > Hi Quanlong,
>>> >
>>> > Thanks for the suggestion. I wonder if there is a third strategy:
>>> >
>>> > c) Isolate the Hadoop 2.x/3.x differences into clearly-defined driver
>>> > layer so that basically all of 3.x can be applied to the 2.x branch.
>>> Said
>>> > another way, a single source base can work against either Hadoop 2.x or
>>> > 3.x, with the build (C++) or runtime (Java) choosing the proper
>>> “driver”
>>> > classes.
>>> >
>>>
>>> We had such a layer for a while, where Impala master could be built
>>> against
>>> either Hadoop3 or Hadoop2. We decided to clean it up in commit
>>> e4ae605b083ab536c68552e37ca3c46f6bff4c76.
>>>
>>> commit e4ae605b083ab536c68552e37ca3c46f6bff4c76
>>> Author: Fredy Wijaya <fw...@cloudera.com>
>>> Date:   Thu Jul 12 17:01:13 2018 -0700
>>>
>>>     IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
>>>
>>>     This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code
>>> that
>>>     uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code
>>> from
>>>     IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code
>>>     changes in this patch, there is no code change for the shims. The
>>> shims
>>>     for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default
>>>     implementation.
>>>
>>>     Testing:
>>>     - Ran core and exhaustive tests
>>>
>>>     Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
>>>     Reviewed-on: http://gerrit.cloudera.org:8080/10940
>>>     Reviewed-by: Impala Public Jenkins <
>>> impala-public-jenkins@cloudera.com>
>>>     Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com
>>> >
>>>
>>

Re: Move forward branch-2.x

Posted by Quanlong Huang <hu...@gmail.com>.
>For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor interfaces
for Db, View, Table, Partition", the cherry-pick conflicts is due to the
revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479 being
picked back. Does anyone know why we revert it? (I also comment in the
JIRA).

There are test failures. I guess it's the reason. Hopefully,
cdh-5.16.1-release already picked up this patch, which provides some
pointers :)

On Mon, Jan 28, 2019 at 10:51 PM Quanlong Huang <hu...@gmail.com>
wrote:

> Yes, there are two discussion threads before that are relative to this.
> One for stopping the cherrypick-2.x-and-test jenkins job:
>
> https://lists.apache.org/thread.html/2b4b62d4c07661b27a5203618cb0425a429f6460f2eb505acbcd26c6@%3Cdev.impala.apache.org%3E
>
> The other for removing support for hadoop 2 in master branch:
> https://lists.apache.org/thread.html/49f9b68ed3d6d2c0fdee16a877b259922545e4824e1233479227a657@%3Cdev.impala.apache.org%3E
>
> I'm +1 with the second thread that we only support Hadoop 2 in branch-2.x
> and support Hadoop 3 in the master branch to be more focused. I'm also
> agree with Paul's concern. It's such a dilemma that if we skip some
> commits, things will be harder and harder as we moving forward; if we
> cherry-pick, review, and test the commits one by one, branch-2.x will never
> catch up the master branch, which is an obstacle if someone (like me) wants
> to backport his/her new patch to branch-2.x but waits too long and finally
> fogets details of the patch.
>
> I roughly investigated how other systems deal with multiple branches. The
> efforts to backport a patch could be the same for the original patch. It's
> not a easy go, so the Hive community declares that
> "The decision to port a feature from master to branch-1 is at the
> discretion of the contributor and committer. However no features that break
> backwards compatibility will be accepted on branch-1."
>
> I think it's a chance to understand more parts of Impala by learning and
> backporting the patches, since they have execellent commit messages and
> were strictly reviewed. So I volunteer for the job to move forward the
> branch-2.x. Hopes patch authors could give some pointers when I'm blocked!
> I'll try approach (b) first and switch to (a) when (b) becomes impossible
> after too many commits are skipped. I'll confirm with the author if I think
> a patch should be skipped.
>
> For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor interfaces
> for Db, View, Table, Partition", the cherry-pick conflicts is due to the
> revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479 being
> picked back. Does anyone know why we revert it? (I also comment in the
> JIRA).
>
> On Mon, Jan 28, 2019 at 12:43 PM Philip Zeyliger <ph...@cloudera.com>
> wrote:
>
>> As for Quanlong's question, I think the answer is however the folks who
>> want to do the work prefer to do it. As you noticed in the CDH
>> changelists,
>> Cloudera's distribution has opted for something more like approach (a),
>> choosing to backport individual features. For a while, we were doing
>> automation for cherry-picking things automatically, and it got tedious
>> enough that we decided to turn it off.
>>
>> On Sun, Jan 27, 2019 at 7:37 PM Paul Rogers <pr...@cloudera.com> wrote:
>>
>> > Hi Quanlong,
>> >
>> > Thanks for the suggestion. I wonder if there is a third strategy:
>> >
>> > c) Isolate the Hadoop 2.x/3.x differences into clearly-defined driver
>> > layer so that basically all of 3.x can be applied to the 2.x branch.
>> Said
>> > another way, a single source base can work against either Hadoop 2.x or
>> > 3.x, with the build (C++) or runtime (Java) choosing the proper “driver”
>> > classes.
>> >
>>
>> We had such a layer for a while, where Impala master could be built
>> against
>> either Hadoop3 or Hadoop2. We decided to clean it up in commit
>> e4ae605b083ab536c68552e37ca3c46f6bff4c76.
>>
>> commit e4ae605b083ab536c68552e37ca3c46f6bff4c76
>> Author: Fredy Wijaya <fw...@cloudera.com>
>> Date:   Thu Jul 12 17:01:13 2018 -0700
>>
>>     IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
>>
>>     This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code
>> that
>>     uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code
>> from
>>     IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code
>>     changes in this patch, there is no code change for the shims. The
>> shims
>>     for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default
>>     implementation.
>>
>>     Testing:
>>     - Ran core and exhaustive tests
>>
>>     Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
>>     Reviewed-on: http://gerrit.cloudera.org:8080/10940
>>     Reviewed-by: Impala Public Jenkins <
>> impala-public-jenkins@cloudera.com>
>>     Tested-by: Impala Public Jenkins <im...@cloudera.com>
>>
>

Re: Move forward branch-2.x

Posted by Quanlong Huang <hu...@gmail.com>.
Yes, there are two discussion threads before that are relative to this. One
for stopping the cherrypick-2.x-and-test jenkins job:
https://lists.apache.org/thread.html/2b4b62d4c07661b27a5203618cb0425a429f6460f2eb505acbcd26c6@%3Cdev.impala.apache.org%3E

The other for removing support for hadoop 2 in master branch:
https://lists.apache.org/thread.html/49f9b68ed3d6d2c0fdee16a877b259922545e4824e1233479227a657@%3Cdev.impala.apache.org%3E

I'm +1 with the second thread that we only support Hadoop 2 in branch-2.x
and support Hadoop 3 in the master branch to be more focused. I'm also
agree with Paul's concern. It's such a dilemma that if we skip some
commits, things will be harder and harder as we moving forward; if we
cherry-pick, review, and test the commits one by one, branch-2.x will never
catch up the master branch, which is an obstacle if someone (like me) wants
to backport his/her new patch to branch-2.x but waits too long and finally
fogets details of the patch.

I roughly investigated how other systems deal with multiple branches. The
efforts to backport a patch could be the same for the original patch. It's
not a easy go, so the Hive community declares that
"The decision to port a feature from master to branch-1 is at the
discretion of the contributor and committer. However no features that break
backwards compatibility will be accepted on branch-1."

I think it's a chance to understand more parts of Impala by learning and
backporting the patches, since they have execellent commit messages and
were strictly reviewed. So I volunteer for the job to move forward the
branch-2.x. Hopes patch authors could give some pointers when I'm blocked!
I'll try approach (b) first and switch to (a) when (b) becomes impossible
after too many commits are skipped. I'll confirm with the author if I think
a patch should be skipped.

For the first patch, "0b540b025 IMPALA-7128 (part 1) Refactor interfaces
for Db, View, Table, Partition", the cherry-pick conflicts is due to the
revert of IMPALA-6479 in 2.x. I'm testing branch-2.x with IMPALA-6479 being
picked back. Does anyone know why we revert it? (I also comment in the
JIRA).

On Mon, Jan 28, 2019 at 12:43 PM Philip Zeyliger <ph...@cloudera.com>
wrote:

> As for Quanlong's question, I think the answer is however the folks who
> want to do the work prefer to do it. As you noticed in the CDH changelists,
> Cloudera's distribution has opted for something more like approach (a),
> choosing to backport individual features. For a while, we were doing
> automation for cherry-picking things automatically, and it got tedious
> enough that we decided to turn it off.
>
> On Sun, Jan 27, 2019 at 7:37 PM Paul Rogers <pr...@cloudera.com> wrote:
>
> > Hi Quanlong,
> >
> > Thanks for the suggestion. I wonder if there is a third strategy:
> >
> > c) Isolate the Hadoop 2.x/3.x differences into clearly-defined driver
> > layer so that basically all of 3.x can be applied to the 2.x branch. Said
> > another way, a single source base can work against either Hadoop 2.x or
> > 3.x, with the build (C++) or runtime (Java) choosing the proper “driver”
> > classes.
> >
>
> We had such a layer for a while, where Impala master could be built against
> either Hadoop3 or Hadoop2. We decided to clean it up in commit
> e4ae605b083ab536c68552e37ca3c46f6bff4c76.
>
> commit e4ae605b083ab536c68552e37ca3c46f6bff4c76
> Author: Fredy Wijaya <fw...@cloudera.com>
> Date:   Thu Jul 12 17:01:13 2018 -0700
>
>     IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2
>
>     This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code that
>     uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code
> from
>     IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code
>     changes in this patch, there is no code change for the shims. The shims
>     for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default
>     implementation.
>
>     Testing:
>     - Ran core and exhaustive tests
>
>     Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
>     Reviewed-on: http://gerrit.cloudera.org:8080/10940
>     Reviewed-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com
> >
>     Tested-by: Impala Public Jenkins <im...@cloudera.com>
>

Re: Move forward branch-2.x

Posted by Philip Zeyliger <ph...@cloudera.com>.
As for Quanlong's question, I think the answer is however the folks who
want to do the work prefer to do it. As you noticed in the CDH changelists,
Cloudera's distribution has opted for something more like approach (a),
choosing to backport individual features. For a while, we were doing
automation for cherry-picking things automatically, and it got tedious
enough that we decided to turn it off.

On Sun, Jan 27, 2019 at 7:37 PM Paul Rogers <pr...@cloudera.com> wrote:

> Hi Quanlong,
>
> Thanks for the suggestion. I wonder if there is a third strategy:
>
> c) Isolate the Hadoop 2.x/3.x differences into clearly-defined driver
> layer so that basically all of 3.x can be applied to the 2.x branch. Said
> another way, a single source base can work against either Hadoop 2.x or
> 3.x, with the build (C++) or runtime (Java) choosing the proper “driver”
> classes.
>

We had such a layer for a while, where Impala master could be built against
either Hadoop3 or Hadoop2. We decided to clean it up in commit
e4ae605b083ab536c68552e37ca3c46f6bff4c76.

commit e4ae605b083ab536c68552e37ca3c46f6bff4c76
Author: Fredy Wijaya <fw...@cloudera.com>
Date:   Thu Jul 12 17:01:13 2018 -0700

    IMPALA-7295: Remove IMPALA_MINICLUSTER_PROFILE=2

    This patch removes the use of IMPALA_MINICLUSTER_PROFILE. The code that
    uses IMPALA_MINICLUSTER_PROFILE=2 is removed and it defaults to code
from
    IMPALA_MINICLUSTER_PROFILE=3. In order to reduce having too many code
    changes in this patch, there is no code change for the shims. The shims
    for IMPALA_MINICLUSTER_PROFILE=3 automatically become the default
    implementation.

    Testing:
    - Ran core and exhaustive tests

    Change-Id: Iba4a81165b3d2012dc04d4115454372c41e39f08
    Reviewed-on: http://gerrit.cloudera.org:8080/10940
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>

Re: Move forward branch-2.x

Posted by Paul Rogers <pr...@cloudera.com>.
Hi Quanlong,

Thanks for the suggestion. I wonder if there is a third strategy:

c) Isolate the Hadoop 2.x/3.x differences into clearly-defined driver layer so that basically all of 3.x can be applied to the 2.x branch. Said another way, a single source base can work against either Hadoop 2.x or 3.x, with the build (C++) or runtime (Java) choosing the proper “driver” classes.

This is the method used by Oracle, Informix and others back in the days when dozens of companies had their own “Unix standard.”

Anyone know the dependencies that differ between 2.x and 3.x? I’d guess they are large: HDFS, HMS, HBase, Hive and more… I wonder how hard it would be to factor those out of the code into a driver layer. What would be the cost of doing that vs. the cost of maintaining two divergent branches?

I’d be concerned that so many changes have gone into the 3.x branch that cherry-picking will get progressively more difficult, especially if commits are skipped. I saw this recently when we tried to back-port a recent patch on the 3.x branch to the 2.x branch.

Thanks,

- Paul

> On Jan 27, 2019, at 7:09 PM, Quanlong Huang <hu...@gmail.com> wrote:
> 
> Hi friends,
> 
> It's time to move forward the branch-2.x. Though we've made great
> features/improvements in Impala-3.x, people’s impression of Impala is still
> in the 2.x era. Most of them still using Hadoop2 in production and have no
> choices to try Impala-3.x. I believe Hadoop2 will still be used for some
> years. It's a pity if we lose those users.
> 
> I'd like to have a try to move forward branch-2.x. Hopes you can give some
> suggestions! There're two proposals I can come up with:
> (a) Cherry-pick mature improvements/features into branch-2.x feature by
> feature.
> (b) Cherry-pick commits in branch-3.x one by one (skip those just for 3.x)
> 
> I summarize a "commits diff" between branch-3.x, branch-2.x and
> cloudera/cdh-5.16.1-release:
> https://docs.google.com/spreadsheets/d/12h1rTAPS1gm0vhlDGxeOXjnRD7rrOcoqzX4rjRRCyBg
> 
> It shows up that Cloudera release is doing in (a) and pick up few commits.
> However, It does pick up some commits in batch from branch-3.x (e.g.
> commits of LocalCatalog). I think it's a good example for (a).
> 
> However, (a) needs more efforts than (b). If we doing in way (b), we just
> need to fix cherry-pick conflicts, run GVO and then merge the commit if the
> tests are passed.
> 
> What do you think? Could anyone share some experience about how other
> projects (e.g. Hadoop, Hive, HBase) manage several branches together?
> 
> Thanks,
> Quanlong Huang