You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Peter Vary <pv...@cloudera.com.INVALID> on 2022/04/29 11:46:06 UTC

Release candence

Hi Team,

With Zoltan Haindrich, we have been brainstorming about the next steps after the 4.0.0-alpha-1 release.

We come up with the following plan:
- Define a desired scope for the 4.0.0 release
- Release minimally quarterly - create alpha release(s) until the scope is reached
- If the scope is reached - create a beta release
- For fixes - create a beta release
- If we are satisfied with the quality of the release then we can release the Hive 4.0.0
- Keep up with the quarterly release cadence

Until now we collected the following items which could be part of the scope:
- Java 11 upgrade (minimally)
- Hadoop 3.3 (needed to the Java 11 upgrade)
- Full Iceberg integration (Read, Write, Delete, Update, Merge)
- Clean up the HMS API interface (deprecate old methods which are already released, remove unreleased methods which have not been released yet, use/create methods with Request objects as parameters instead of Context objects)

We might want to collect information about the usage of specific modules, and might deprecate some based on the feedback (remove them from the release or at least mark them deprecated), so we can reduce the project complexity based on the info. Some features which pooped up:
- HCatalog
- WebHCat
- Pig integration
- ??

We would be interested on any feedback for this plan / scope / deprecation. Feel free to suggest any additions or removals from these lists, or even propose an entirely different plan.
Also if you would like to take over specific tasks, feel free to grab it, and start working on it or start discussing it.

Thanks,
Peter

Re: Release candence

Posted by Stamatis Zampetakis <za...@gmail.com>.
Thanks Peter and Zoltan for starting this discussion. Apologies for the
delay but I had the impression that I already sent this email.

I definitely agree with the idea of having quarterly releases no matter
what they are called (alpha, beta, or other).

I wouldn't base the decision to move from alpha to beta so much on features
like Iceberg or JDK11 but rather on items that show the stability of the
release. Generally, I would be confident to move from alpha to beta if we:
* deploy Hive on a realistic setting and go over few use-cases
* don't identify serious regressions in the last X months

Setting up Hive on a cluster of 5-10 machines and running successfully all
TPC-DS queries over 10GB-100GB would be enough for me to say that we have a
realistic deployment.
Others would probably have different expectations/use-cases so we should
agree on the bare minimum that we would like to have since there is no way
to cover everything.

We released the alpha-1 version so that people can try it out and give
feedback about it. I really hope users take the time to test the recent
releases.
If for a certain period of time we see that there are no
serious regressions then we can move gradually from alpha to beta.
We can discuss the exact amount of time that we want to wait but I would
say that if we get another alpha release out (alpha-2) and people do not
report any serious problems we could move to beta-1.

For moving from beta to stable, I would follow the same scheme; stable
deployment and no important regressions for a reasonable amount of time.

It would be nice if in the next few releases we clarify how packaging
bundles are supposed to be for Hive, metastore, storage-api, etc to avoid
confusion for end-users [1].
I was also gonna mention that it would be good to have javadoc for the next
release [2] but just realized that this is already fixed :) (thanks Peter,
and Zoltan!)

Regarding the discussion about the exec jar I more or less share the same
opinion with Zoltan but we can continue the discussion under the respective
JIRA [3].

Best,
Stamatis

[1] https://issues.apache.org/jira/browse/HIVE-26218
[2] https://issues.apache.org/jira/browse/HIVE-26092
[3] https://issues.apache.org/jira/browse/HIVE-26220

On Wed, May 11, 2022 at 9:05 AM Zoltan Haindrich <ki...@rxd.hu> wrote:

> Hey,
>
>
>  >> In another email thread (
> https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s) Sun
> Chao mentioned that  other projects (Spark,
>  >> Iceberg and Trino/Presto) are still depending on old Hive, because the
> exec-core jar has been removed, and the exec jar contains unshaded versions
> of various
> dependencies. Until this is fixed, they can not upgrade to a newer version
> of Hive, so I would like to add this as a blocker for Hive 4.0.0 release.
>
>  >> @Chao Sun: Could you help us find the jira for this issue, or file a
> new one?
>
> I was thinking about this and I think this is a bit unfair...say project X
> is using Hive 2.3's core jar; should "we" the Hive community do all the
> work to run their project
> with Hive 4? I don't think so.
> What if some project is not interested in upgrading? Should we really put
> efforts into thing even in that case?
>
> The best middle ground idea I was able to come up so far was to ask for a
> broken development branch set up to run with some 4.0.0-alpha-X release
> where we can start fixing
> the shading issues they might face together.
> In this case they will be already ready to go upgrading their Hive; and if
> they also able to run tests/etc: as a bonus we will get early
> pre-integration feedback(s)...which
> will be valuable for both them and us.
>
> What do you guys think?
> Are there any other options?
>
> cheers,
> Zoltan
>
> On 5/11/22 7:33 AM, Chao Sun wrote:
> > Thanks for reminding me, Peter. There is
> > https://issues.apache.org/jira/browse/HIVE-25317 but that's for Hive
> > 2.3 and is mostly for the Spark use case. I just created
> > https://issues.apache.org/jira/browse/HIVE-26220 and marked it as a
> > blocker.
> >
> > On Tue, May 10, 2022 at 10:01 PM Peter Vary <pv...@cloudera.com> wrote:
> >>
> >> In another email thread (
> https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s) Sun
> Chao mentioned that  other projects (Spark,
> >> Iceberg and Trino/Presto) are still depending on old Hive, because the
> exec-core jar has been removed, and the exec jar contains unshaded versions
> of various dependencies. Until this is fixed, they can not upgrade to a
> newer version of Hive, so I would like to add this as a blocker for Hive
> 4.0.0 release.
> >>
> >> @Chao Sun: Could you help us find the jira for this issue, or file a
> new one?
> >>
> >> Any more blockers?
> >>
> >> Thanks,
> >> Peter
> >>
> >> On Fri, Apr 29, 2022, 13:46 Peter Vary <pv...@cloudera.com> wrote:
> >>>
> >>> Hi Team,
> >>>
> >>> With Zoltan Haindrich, we have been brainstorming about the next steps
> after the 4.0.0-alpha-1 release.
> >>>
> >>> We come up with the following plan:
> >>> - Define a desired scope for the 4.0.0 release
> >>> - Release minimally quarterly - create alpha release(s) until the
> scope is reached
> >>> - If the scope is reached - create a beta release
> >>> - For fixes - create a beta release
> >>> - If we are satisfied with the quality of the release then we can
> release the Hive 4.0.0
> >>> - Keep up with the quarterly release cadence
> >>>
> >>> Until now we collected the following items which could be part of the
> scope:
> >>> - Java 11 upgrade (minimally)
> >>> - Hadoop 3.3 (needed to the Java 11 upgrade)
> >>> - Full Iceberg integration (Read, Write, Delete, Update, Merge)
> >>> - Clean up the HMS API interface (deprecate old methods which are
> already released, remove unreleased methods which have not been released
> yet, use/create methods with Request objects as parameters instead of
> Context objects)
> >>>
> >>> We might want to collect information about the usage of specific
> modules, and might deprecate some based on the feedback (remove them from
> the release or at least mark them deprecated), so we can reduce the project
> complexity based on the info. Some features which pooped up:
> >>> - HCatalog
> >>> - WebHCat
> >>> - Pig integration
> >>> - ??
> >>>
> >>> We would be interested on any feedback for this plan / scope /
> deprecation. Feel free to suggest any additions or removals from these
> lists, or even propose an entirely different plan.
> >>> Also if you would like to take over specific tasks, feel free to grab
> it, and start working on it or start discussing it.
> >>>
> >>> Thanks,
> >>> Peter
>

Re: Release candence

Posted by Zoltan Haindrich <ki...@rxd.hu>.
Hey,


 >> In another email thread (https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s) Sun Chao mentioned that  other projects (Spark,
 >> Iceberg and Trino/Presto) are still depending on old Hive, because the exec-core jar has been removed, and the exec jar contains unshaded versions of various 
dependencies. Until this is fixed, they can not upgrade to a newer version of Hive, so I would like to add this as a blocker for Hive 4.0.0 release.

 >> @Chao Sun: Could you help us find the jira for this issue, or file a new one?

I was thinking about this and I think this is a bit unfair...say project X is using Hive 2.3's core jar; should "we" the Hive community do all the work to run their project 
with Hive 4? I don't think so.
What if some project is not interested in upgrading? Should we really put efforts into thing even in that case?

The best middle ground idea I was able to come up so far was to ask for a broken development branch set up to run with some 4.0.0-alpha-X release where we can start fixing 
the shading issues they might face together.
In this case they will be already ready to go upgrading their Hive; and if they also able to run tests/etc: as a bonus we will get early pre-integration feedback(s)...which 
will be valuable for both them and us.

What do you guys think?
Are there any other options?

cheers,
Zoltan

On 5/11/22 7:33 AM, Chao Sun wrote:
> Thanks for reminding me, Peter. There is
> https://issues.apache.org/jira/browse/HIVE-25317 but that's for Hive
> 2.3 and is mostly for the Spark use case. I just created
> https://issues.apache.org/jira/browse/HIVE-26220 and marked it as a
> blocker.
> 
> On Tue, May 10, 2022 at 10:01 PM Peter Vary <pv...@cloudera.com> wrote:
>>
>> In another email thread (https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s) Sun Chao mentioned that  other projects (Spark,
>> Iceberg and Trino/Presto) are still depending on old Hive, because the exec-core jar has been removed, and the exec jar contains unshaded versions of various dependencies. Until this is fixed, they can not upgrade to a newer version of Hive, so I would like to add this as a blocker for Hive 4.0.0 release.
>>
>> @Chao Sun: Could you help us find the jira for this issue, or file a new one?
>>
>> Any more blockers?
>>
>> Thanks,
>> Peter
>>
>> On Fri, Apr 29, 2022, 13:46 Peter Vary <pv...@cloudera.com> wrote:
>>>
>>> Hi Team,
>>>
>>> With Zoltan Haindrich, we have been brainstorming about the next steps after the 4.0.0-alpha-1 release.
>>>
>>> We come up with the following plan:
>>> - Define a desired scope for the 4.0.0 release
>>> - Release minimally quarterly - create alpha release(s) until the scope is reached
>>> - If the scope is reached - create a beta release
>>> - For fixes - create a beta release
>>> - If we are satisfied with the quality of the release then we can release the Hive 4.0.0
>>> - Keep up with the quarterly release cadence
>>>
>>> Until now we collected the following items which could be part of the scope:
>>> - Java 11 upgrade (minimally)
>>> - Hadoop 3.3 (needed to the Java 11 upgrade)
>>> - Full Iceberg integration (Read, Write, Delete, Update, Merge)
>>> - Clean up the HMS API interface (deprecate old methods which are already released, remove unreleased methods which have not been released yet, use/create methods with Request objects as parameters instead of Context objects)
>>>
>>> We might want to collect information about the usage of specific modules, and might deprecate some based on the feedback (remove them from the release or at least mark them deprecated), so we can reduce the project complexity based on the info. Some features which pooped up:
>>> - HCatalog
>>> - WebHCat
>>> - Pig integration
>>> - ??
>>>
>>> We would be interested on any feedback for this plan / scope / deprecation. Feel free to suggest any additions or removals from these lists, or even propose an entirely different plan.
>>> Also if you would like to take over specific tasks, feel free to grab it, and start working on it or start discussing it.
>>>
>>> Thanks,
>>> Peter

Re: Release candence

Posted by Chao Sun <su...@apache.org>.
Thanks for reminding me, Peter. There is
https://issues.apache.org/jira/browse/HIVE-25317 but that's for Hive
2.3 and is mostly for the Spark use case. I just created
https://issues.apache.org/jira/browse/HIVE-26220 and marked it as a
blocker.

On Tue, May 10, 2022 at 10:01 PM Peter Vary <pv...@cloudera.com> wrote:
>
> In another email thread (https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s) Sun Chao mentioned that  other projects (Spark,
> Iceberg and Trino/Presto) are still depending on old Hive, because the exec-core jar has been removed, and the exec jar contains unshaded versions of various dependencies. Until this is fixed, they can not upgrade to a newer version of Hive, so I would like to add this as a blocker for Hive 4.0.0 release.
>
> @Chao Sun: Could you help us find the jira for this issue, or file a new one?
>
> Any more blockers?
>
> Thanks,
> Peter
>
> On Fri, Apr 29, 2022, 13:46 Peter Vary <pv...@cloudera.com> wrote:
>>
>> Hi Team,
>>
>> With Zoltan Haindrich, we have been brainstorming about the next steps after the 4.0.0-alpha-1 release.
>>
>> We come up with the following plan:
>> - Define a desired scope for the 4.0.0 release
>> - Release minimally quarterly - create alpha release(s) until the scope is reached
>> - If the scope is reached - create a beta release
>> - For fixes - create a beta release
>> - If we are satisfied with the quality of the release then we can release the Hive 4.0.0
>> - Keep up with the quarterly release cadence
>>
>> Until now we collected the following items which could be part of the scope:
>> - Java 11 upgrade (minimally)
>> - Hadoop 3.3 (needed to the Java 11 upgrade)
>> - Full Iceberg integration (Read, Write, Delete, Update, Merge)
>> - Clean up the HMS API interface (deprecate old methods which are already released, remove unreleased methods which have not been released yet, use/create methods with Request objects as parameters instead of Context objects)
>>
>> We might want to collect information about the usage of specific modules, and might deprecate some based on the feedback (remove them from the release or at least mark them deprecated), so we can reduce the project complexity based on the info. Some features which pooped up:
>> - HCatalog
>> - WebHCat
>> - Pig integration
>> - ??
>>
>> We would be interested on any feedback for this plan / scope / deprecation. Feel free to suggest any additions or removals from these lists, or even propose an entirely different plan.
>> Also if you would like to take over specific tasks, feel free to grab it, and start working on it or start discussing it.
>>
>> Thanks,
>> Peter

Re: Release candence

Posted by Peter Vary <pv...@cloudera.com.INVALID>.
In another email thread (
https://lists.apache.org/thread/sxcrcf4v9j630tl9domp0bn4m33bdq0s) Sun Chao
mentioned that  other projects (Spark,
Iceberg and Trino/Presto) are still depending on old Hive, because the
exec-core jar has been removed, and the exec jar contains unshaded versions
of various dependencies. Until this is fixed, they can not upgrade to a
newer version of Hive, so I would like to add this as a blocker for Hive
4.0.0 release.

@Chao Sun <su...@apache.org>: Could you help us find the jira for this
issue, or file a new one?

Any more blockers?

Thanks,
Peter

On Fri, Apr 29, 2022, 13:46 Peter Vary <pv...@cloudera.com> wrote:

> Hi Team,
>
> With Zoltan Haindrich, we have been brainstorming about the next steps
> after the 4.0.0-alpha-1 release.
>
> We come up with the following plan:
> - Define a desired scope for the 4.0.0 release
> - Release minimally quarterly - create alpha release(s) until the scope is
> reached
> - If the scope is reached - create a beta release
> - For fixes - create a beta release
> - If we are satisfied with the quality of the release then we can release
> the Hive 4.0.0
> - Keep up with the quarterly release cadence
>
> Until now we collected the following items which could be part of the
> scope:
> - Java 11 upgrade (minimally)
> - Hadoop 3.3 (needed to the Java 11 upgrade)
> - Full Iceberg integration (Read, Write, Delete, Update, Merge)
> - Clean up the HMS API interface (deprecate old methods which are already
> released, remove unreleased methods which have not been released yet,
> use/create methods with Request objects as parameters instead of Context
> objects)
>
> We might want to collect information about the usage of specific modules,
> and might deprecate some based on the feedback (remove them from the
> release or at least mark them deprecated), so we can reduce the project
> complexity based on the info. Some features which pooped up:
> - HCatalog
> - WebHCat
> - Pig integration
> - ??
>
> We would be interested on any feedback for this plan / scope /
> deprecation. Feel free to suggest any additions or removals from these
> lists, or even propose an entirely different plan.
> Also if you would like to take over specific tasks, feel free to grab it,
> and start working on it or start discussing it.
>
> Thanks,
> Peter
>