You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sungwoo Park <gl...@gmail.com> on 2021/03/18 08:02:56 UTC

Maintaining Hive 2 and 3 branches,

Hello Hive users,

After attending the Hive meetup yesterday (huge thanks to the organizers!),
I thought that perhaps many organizations were maintaining their own Hive 2
and 3 branches by backporting important patches to vanilla Hive. Ideally it
would be great if all the important patches were regularly merged to Hive 2
and 3 branches (e.g., branch-2.3 and branch-3.1), but I guess this would
take a lot of time and effort on the Hive committer side, and it also seems
like at the moment, most of the efforts are directed at the master branch.

I find this process of backporting patches to Hive 2 and 3 branches to be
quite a challenge and time-consuming, especially to those "outsiders" who
have not implemented/reviewed the patches. The problem is two-fold: 1) you
have to decide what patches to apply and in what order; 2) you have to run
all the tests to make sure that new patches are compatible with the code
base and do not introduce new bugs.

1) is not easy because sometimes a patch from the master branch fails to
merge because of missing dependencies. In such a case, you have to go back
to the history of commits, identify those dependency commits, and merge
them first. Depending on the level of changes made in the patch, this can
be a big pain.

2) can be also a problem if applying a new patch produces different test
results. Sometimes a patch is merged with no conflicts, but some tests
fail. Besides it may take a lot of time to run tests themselves.

So, I wonder if anyone could share their experience and wisdom on how to
maintain Hive 2 and 3 branches, or share their git repos. For us, we have
applied about 210 patches to Hive 3.1.3 (since Nov 2, 2020), and are in the
middle of applying additional 100+ patches. You can find our work at the
following repo. (You can ignore the last commit which is internal to our
work.)

https://github.com/mr3project/hive-mr3/commits/master3

Thanks,

--- Sungwoo Park

Re: Maintaining Hive 2 and 3 branches,

Posted by Sungwoo Park <gl...@gmail.com>.
Hi Peter,

- Are these patches you mention below bugfixes, or new features on Hive
> 3.1.3? (This might be a typo as I think the last Hive release is 3.1.2)
>

They are a collection of bug-fixes and improvements picked up from
master/branch-3 branches. The list is mostly based on the additional
commits found in HDP 3.1.5 and Qubole Hive 3 relative to Hive 3.1.2. I
mistakenly mentioned Hive 3.1.3 because it applies up to the last few
commits in branch-3.1 which set Hive version to 3.1.3 in pom.xml. We could
think of the list as about 210 commits applied to Hive 3.1.2.


> - Could you backport these patches to the apache branch-3, and branch-3.1?
>
- Is there any reason not to?
>

I could backport these patches to branch-3.1, but I can think of two
potential problems. For branch-3, we need a separate list of commits.

1) Stability of the branch after applying these patches.
As ordinary users of Hive, we cannot convince ourselves that the patches
can be applied safely because we don't have definitive criteria like
"everything is okay if the code passes these tests". So I think either we
should be given such criteria or someone else (Hive committers) should
manually inspect individual patches and test results again.

2) As our repo is essentially a fork from Hive 3.1.2, we cannot apply these
patches to branch-3.1 in their current form.


> I am asking this because I think the best way to move forward is to
> consolidate these backports to a single repo, preferably to the apache one,
> so everyone can benefit from it.
>

Indeed. I hope we will figure out how to make progress for this problem.

Thanks,

--- Sungwoo

Re: Maintaining Hive 2 and 3 branches,

Posted by Peter Vary <pv...@cloudera.com>.
Thanks Sungwoo for sharing this!

A few questions:
- Are these patches you mention below bugfixes, or new features on Hive 3.1.3? (This might be a typo as I think the last Hive release is 3.1.2)
- Could you backport these patches to the apache branch-3, and branch-3.1?
- Is there any reason not to?

I am asking this because I think the best way to move forward is to consolidate these backports to a single repo, preferably to the apache one, so everyone can benefit from it.

What do you think?

Thanks,
Peter

> On Mar 18, 2021, at 09:02, Sungwoo Park <gl...@gmail.com> wrote:
> 
> Hello Hive users,
> 
> After attending the Hive meetup yesterday (huge thanks to the organizers!), I thought that perhaps many organizations were maintaining their own Hive 2 and 3 branches by backporting important patches to vanilla Hive. Ideally it would be great if all the important patches were regularly merged to Hive 2 and 3 branches (e.g., branch-2.3 and branch-3.1), but I guess this would take a lot of time and effort on the Hive committer side, and it also seems like at the moment, most of the efforts are directed at the master branch.
> 
> I find this process of backporting patches to Hive 2 and 3 branches to be quite a challenge and time-consuming, especially to those "outsiders" who have not implemented/reviewed the patches. The problem is two-fold: 1) you have to decide what patches to apply and in what order; 2) you have to run all the tests to make sure that new patches are compatible with the code base and do not introduce new bugs.
> 
> 1) is not easy because sometimes a patch from the master branch fails to merge because of missing dependencies. In such a case, you have to go back to the history of commits, identify those dependency commits, and merge them first. Depending on the level of changes made in the patch, this can be a big pain.
> 
> 2) can be also a problem if applying a new patch produces different test results. Sometimes a patch is merged with no conflicts, but some tests fail. Besides it may take a lot of time to run tests themselves.
> 
> So, I wonder if anyone could share their experience and wisdom on how to maintain Hive 2 and 3 branches, or share their git repos. For us, we have applied about 210 patches to Hive 3.1.3 (since Nov 2, 2020), and are in the middle of applying additional 100+ patches. You can find our work at the following repo. (You can ignore the last commit which is internal to our work.)
> 
> https://github.com/mr3project/hive-mr3/commits/master3 <https://github.com/mr3project/hive-mr3/commits/master3>
> 
> Thanks,
> 
> --- Sungwoo Park
> 
> 
> 
> 
> 
> 
>