You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by vihang karajgaonkar <vi...@apache.org> on 2023/05/22 18:14:46 UTC

[DISCUSS] Nightly snaphot builds

Hello Team,

I have observed that it is a common use-case where users would like to test
out unreleased features/bug fixes either to unblock them or test out if the
bug fixes really work as intended in their environments. Today in the case
of Apache Hive, this is not very user friendly because it requires the end
user to build the binaries directly from the hive source code.

I found that Apache Spark has a very useful infrastructure [1] which
deploys nightly snapshots [2] [3] from the branch using github actions.
This is super useful for any user who wants to try out the latest and
greatest using the nightly builds.

I was wondering if we should also adopt this. We can use github actions to
upload the snapshot jars to the public repository (e.g github packages) and
schedule it as a nightly job.

[1] https://issues.apache.org/jira/browse/INFRA-21167
[2] https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
[3] https://github.com/apache/spark/pull/30623

I can take a stab at this if the community thinks that this is a nice thing
to have.

Thanks,
Vihang

Re: [DISCUSS] Nightly snaphot builds

Posted by vihang karajgaonkar <vi...@apache.org>.
Thanks Zoltan. Makes sense. Also, we should definitely strive to release
within 180 days especially when there are lots of commits to a branch.

-Vihang

On Fri, May 26, 2023 at 12:04 AM Zoltan Haindrich <ki...@rxd.hu> wrote:

> On 5/25/23 19:58, vihang karajgaonkar wrote:
> > I just tried the job and it worked as expected. Thanks! If I understand
> > correctly, the job retains builds for 180 days. Does it mean if there
> were
> > no commits to a branch for more than 180 days, we will lose the build
> > artifacts eventually?
>
> not entirely - the removal of old builds is a post-build action; which
> means - if there are no builds; the removal logic will never run
> https://plugins.jenkins.io/discard-old-build/
>
> on the other hand I wonder how much value a nightly build can still
> provide after 180 days :)
> preferably - a real release should be done after some time :)
>
> cheers,
> Zoltan
>
> >
> > On Thu, May 25, 2023 at 1:50 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> >
> >> Hey Vihang,
> >>
> >> I've added you as an admin; and I've copied the job as
> >> http://ci.hive.apache.org/job/hive-nightly-branch-3/
> >> other option could be to trigger the original job or use
> >> parameterized-scheduler  but that would configure a real unconditional
> >> nightly build - which will just build the
> >> same version over-and-over again if there are no changes...
> >> ...the current nighly is SCM triggered ; but only once-a-day it makes a
> >> check which creates the desired results.
> >>
> >> the least painfull was to copy the job; I guess no-one touched the
> >> pipeline script ever since it was introduced :D
> >>
> >> cheers,
> >> Zoltan
> >>
> >> On 5/25/23 01:26, vihang karajgaonkar wrote:
> >>> I created https://issues.apache.org/jira/browse/HIVE-27371 to have
> >> nightly
> >>> builds for branch-3. Once that is merged, I think we can have scheduled
> >>> builds for branch-3 as well. Although, I don't have permissions to
> >> create a
> >>> new job for branch-3. Does anyone know how to do it?
> >>>
> >>> Thanks,
> >>> Vihang
> >>>
> >>> On Wed, May 24, 2023 at 10:07 AM vihang karajgaonkar <
> >> vihangk1@apache.org>
> >>> wrote:
> >>>
> >>>> The nightly job http://ci.hive.apache.org/job/hive-nightly/ is great.
> >> Can
> >>>> we have this for branch-3 as well since we have been backporting a lot
> >> of
> >>>> PRs to branch-3 lately.
> >>>>
> >>>> Thanks,
> >>>> Vihang
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Wed, May 24, 2023 at 6:56 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> >>>>
> >>>>> Hey,
> >>>>>
> >>>>>    > We already have nightly builds for Hive [1].
> >>>>>    > [1] http://ci.hive.apache.org/job/hive-nightly/
> >>>>>
> >>>>> ...and hive-dev-box can launch such archives; either by using it like
> >>>>> this:
> >>>>> https://www.mail-archive.com/dev@hive.apache.org/msg142420.html
> >>>>>
> >>>>> or with a somewhat longer command you could launch hdb in bazaar
> mode;
> >>>>> and have an HS2 running with a nightly version:
> >>>>>
> >>>>> docker run --rm -d -p 10000:10000 -v hive-dev-box_work:/work -e
> >>>>> HIVE_VERSION=
> >>>>>
> >>
> http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz
> >>>>> --name hive
> >>>>> kgyrtkirk/hive-dev-box:bazaar
> >>>>>
> >>>>> cheers,
> >>>>> Zoltan
> >>>>>
> >>>>> On 5/24/23 09:15, Stamatis Zampetakis wrote:
> >>>>>> Hey all,
> >>>>>>
> >>>>>> We already have nightly builds for Hive [1].
> >>>>>>
> >>>>>> Do we need something more than that?
> >>>>>>
> >>>>>> Best,
> >>>>>> Stamatis
> >>>>>>
> >>>>>> [1] http://ci.hive.apache.org/job/hive-nightly/
> >>>>>>
> >>>>>>
> >>>>>> On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar <
> >>>>> vihangk1@apache.org> wrote:
> >>>>>>>
> >>>>>>> I think there are many benefits like others in this thread
> suggested
> >>>>> which
> >>>>>>> can be built on top of nightly builds. Having docker images is
> great
> >>>>> but
> >>>>>>> for now I think we can start simple and publish the jars. Many
> users
> >>>>> still
> >>>>>>> just deploy using jars and it would be useful to them. Once we
> have a
> >>>>>>> docker environment we can add a docker image too to the nightly
> >> builds
> >>>>> so
> >>>>>>> that users can choose their preferred way.
> >>>>>>>
> >>>>>>> On Mon, May 22, 2023 at 11:07 PM Sungwoo Park <gl...@gmail.com>
> >>>>> wrote:
> >>>>>>>
> >>>>>>>> I think such nightly builds will be useful for testing and
> debugging
> >>>>> in the
> >>>>>>>> future.
> >>>>>>>>
> >>>>>>>> I also wonder if we can somehow create builds even from previous
> >>>>> commits
> >>>>>>>> (e.g., for the past few years). Such builds from previous commits
> >>>>> don't
> >>>>>>>> have to be daily builds, and I think weekly builds (or even
> monthly
> >>>>> builds)
> >>>>>>>> would also be very useful.
> >>>>>>>>
> >>>>>>>> The reason I wish such builds were available is to facilitate
> >>>>> debugging and
> >>>>>>>> testing. When tested against the TPC-DS benchmark, the current
> >> master
> >>>>>>>> branch has several correctness problems that were introduced after
> >> the
> >>>>>>>> release of Hive 3.1.2. We have reported all problems known to us
> in
> >>>>> [1] and
> >>>>>>>> also submitted several patches. If such nightly builds had been
> >>>>> available,
> >>>>>>>> we would have saved quite a bit of time for implementing the
> patches
> >>>>> by
> >>>>>>>> quickly finding offending commits that introduced new correctness
> >>>>> bugs.
> >>>>>>>>
> >>>>>>>> In addition, you can find quite a few commits in the master branch
> >>>>> that
> >>>>>>>> report bugs which are not reproduced in Hive 3.1.2. Examples:
> >>>>> HIVE-19990,
> >>>>>>>> HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
> >>>>>>>> HIVE-22227, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
> >>>>>>>> HIVE-25170, HIVE-25864, HIVE-26671.
> >>>>>>>> (There may be some errors in this list because we compared against
> >>>>> Hive
> >>>>>>>> 3.1.2 with many patches backported.) Such nightly builds can be
> >>>>> useful for
> >>>>>>>> finding root causes of such bugs.
> >>>>>>>>
> >>>>>>>> Ideally I wish there was an automated procedure to create nightly
> >>>>> builds,
> >>>>>>>> run TPC-DS benchmark, and report correctness/performance results,
> >>>>> although
> >>>>>>>> this would be quite hard to implement. (I remember Spark
> implemented
> >>>>> this
> >>>>>>>> procedure in the era of Spark 2, but my memory could be wrong.)
> >>>>>>>>
> >>>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-26654
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Tue, May 23, 2023 at 10:44 AM Ayush Saxena <ayushtkn@gmail.com
> >
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Vihang,
> >>>>>>>>> +1, We were even exploring publishing the docker images of the
> >>>>> snapshot
> >>>>>>>>> version as well per commit or maybe weekly, so just shoot 2
> docker
> >>>>>>>> commands
> >>>>>>>>> and you get a Hive cluster running with master code.
> >>>>>>>>>
> >>>>>>>>> Sai, I think to spin up an env via Docker with all these things
> >>>>> should be
> >>>>>>>>> doable for sure, but would require someone with real good
> expertise
> >>>>> with
> >>>>>>>>> docker as well as setting up these services with Hive.
> Obviously, I
> >>>>> am
> >>>>>>>> not
> >>>>>>>>> that guy :-)
> >>>>>>>>>
> >>>>>>>>> @Simhadri has a PR which publishes docker images once a release
> tag
> >>>>> is
> >>>>>>>>> pushed, you can explore to have similar stuff for the Snapshot
> >>>>> version,
> >>>>>>>>> maybe if that sounds cool
> >>>>>>>>>
> >>>>>>>>> -Ayush
> >>>>>>>>>
> >>>>>>>>> On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
> >>>>>>>>> <sa...@cloudera.com.invalid> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Vihang,
> >>>>>>>>>>
> >>>>>>>>>> +1 on the idea.
> >>>>>>>>>>
> >>>>>>>>>> This is a great idea to quickly test if a certain feature is
> >>>>> working as
> >>>>>>>>>> expected on a certain branch.
> >>>>>>>>>> This way we test data loss, correctness, or any other unexpected
> >>>>>>>>> scenarios
> >>>>>>>>>> that are Hive specific only. However, I'm wondering if it is
> >>>>> possible
> >>>>>>>> to
> >>>>>>>>>> deploy/test in a kerberized environment or issues involving
> >>>>>>>> authorization
> >>>>>>>>>> services like sentry/ranger.
> >>>>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> Sai.
> >>>>>>>>>>
> >>>>>>>>>> On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
> >>>>>>>>> vihangk1@apache.org>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hello Team,
> >>>>>>>>>>>
> >>>>>>>>>>> I have observed that it is a common use-case where users would
> >> like
> >>>>>>>> to
> >>>>>>>>>> test
> >>>>>>>>>>> out unreleased features/bug fixes either to unblock them or
> test
> >>>>> out
> >>>>>>>> if
> >>>>>>>>>> the
> >>>>>>>>>>> bug fixes really work as intended in their environments. Today
> in
> >>>>> the
> >>>>>>>>>> case
> >>>>>>>>>>> of Apache Hive, this is not very user friendly because it
> >> requires
> >>>>>>>> the
> >>>>>>>>>> end
> >>>>>>>>>>> user to build the binaries directly from the hive source code.
> >>>>>>>>>>>
> >>>>>>>>>>> I found that Apache Spark has a very useful infrastructure [1]
> >>>>> which
> >>>>>>>>>>> deploys nightly snapshots [2] [3] from the branch using github
> >>>>>>>> actions.
> >>>>>>>>>>> This is super useful for any user who wants to try out the
> latest
> >>>>> and
> >>>>>>>>>>> greatest using the nightly builds.
> >>>>>>>>>>>
> >>>>>>>>>>> I was wondering if we should also adopt this. We can use github
> >>>>>>>> actions
> >>>>>>>>>> to
> >>>>>>>>>>> upload the snapshot jars to the public repository (e.g github
> >>>>>>>> packages)
> >>>>>>>>>> and
> >>>>>>>>>>> schedule it as a nightly job.
> >>>>>>>>>>>
> >>>>>>>>>>> [1] https://issues.apache.org/jira/browse/INFRA-21167
> >>>>>>>>>>> [2]
> >>>>>>>>>
> >> https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
> >>>>>>>>>>> [3] https://github.com/apache/spark/pull/30623
> >>>>>>>>>>>
> >>>>>>>>>>> I can take a stab at this if the community thinks that this is
> a
> >>>>> nice
> >>>>>>>>>> thing
> >>>>>>>>>>> to have.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Vihang
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>

Re: [DISCUSS] Nightly snaphot builds

Posted by Zoltan Haindrich <ki...@rxd.hu>.
On 5/25/23 19:58, vihang karajgaonkar wrote:
> I just tried the job and it worked as expected. Thanks! If I understand
> correctly, the job retains builds for 180 days. Does it mean if there were
> no commits to a branch for more than 180 days, we will lose the build
> artifacts eventually?

not entirely - the removal of old builds is a post-build action; which means - if there are no builds; the removal logic will never run
https://plugins.jenkins.io/discard-old-build/

on the other hand I wonder how much value a nightly build can still provide after 180 days :)
preferably - a real release should be done after some time :)

cheers,
Zoltan

> 
> On Thu, May 25, 2023 at 1:50 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> 
>> Hey Vihang,
>>
>> I've added you as an admin; and I've copied the job as
>> http://ci.hive.apache.org/job/hive-nightly-branch-3/
>> other option could be to trigger the original job or use
>> parameterized-scheduler  but that would configure a real unconditional
>> nightly build - which will just build the
>> same version over-and-over again if there are no changes...
>> ...the current nighly is SCM triggered ; but only once-a-day it makes a
>> check which creates the desired results.
>>
>> the least painfull was to copy the job; I guess no-one touched the
>> pipeline script ever since it was introduced :D
>>
>> cheers,
>> Zoltan
>>
>> On 5/25/23 01:26, vihang karajgaonkar wrote:
>>> I created https://issues.apache.org/jira/browse/HIVE-27371 to have
>> nightly
>>> builds for branch-3. Once that is merged, I think we can have scheduled
>>> builds for branch-3 as well. Although, I don't have permissions to
>> create a
>>> new job for branch-3. Does anyone know how to do it?
>>>
>>> Thanks,
>>> Vihang
>>>
>>> On Wed, May 24, 2023 at 10:07 AM vihang karajgaonkar <
>> vihangk1@apache.org>
>>> wrote:
>>>
>>>> The nightly job http://ci.hive.apache.org/job/hive-nightly/ is great.
>> Can
>>>> we have this for branch-3 as well since we have been backporting a lot
>> of
>>>> PRs to branch-3 lately.
>>>>
>>>> Thanks,
>>>> Vihang
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, May 24, 2023 at 6:56 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
>>>>
>>>>> Hey,
>>>>>
>>>>>    > We already have nightly builds for Hive [1].
>>>>>    > [1] http://ci.hive.apache.org/job/hive-nightly/
>>>>>
>>>>> ...and hive-dev-box can launch such archives; either by using it like
>>>>> this:
>>>>> https://www.mail-archive.com/dev@hive.apache.org/msg142420.html
>>>>>
>>>>> or with a somewhat longer command you could launch hdb in bazaar mode;
>>>>> and have an HS2 running with a nightly version:
>>>>>
>>>>> docker run --rm -d -p 10000:10000 -v hive-dev-box_work:/work -e
>>>>> HIVE_VERSION=
>>>>>
>> http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz
>>>>> --name hive
>>>>> kgyrtkirk/hive-dev-box:bazaar
>>>>>
>>>>> cheers,
>>>>> Zoltan
>>>>>
>>>>> On 5/24/23 09:15, Stamatis Zampetakis wrote:
>>>>>> Hey all,
>>>>>>
>>>>>> We already have nightly builds for Hive [1].
>>>>>>
>>>>>> Do we need something more than that?
>>>>>>
>>>>>> Best,
>>>>>> Stamatis
>>>>>>
>>>>>> [1] http://ci.hive.apache.org/job/hive-nightly/
>>>>>>
>>>>>>
>>>>>> On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar <
>>>>> vihangk1@apache.org> wrote:
>>>>>>>
>>>>>>> I think there are many benefits like others in this thread suggested
>>>>> which
>>>>>>> can be built on top of nightly builds. Having docker images is great
>>>>> but
>>>>>>> for now I think we can start simple and publish the jars. Many users
>>>>> still
>>>>>>> just deploy using jars and it would be useful to them. Once we have a
>>>>>>> docker environment we can add a docker image too to the nightly
>> builds
>>>>> so
>>>>>>> that users can choose their preferred way.
>>>>>>>
>>>>>>> On Mon, May 22, 2023 at 11:07 PM Sungwoo Park <gl...@gmail.com>
>>>>> wrote:
>>>>>>>
>>>>>>>> I think such nightly builds will be useful for testing and debugging
>>>>> in the
>>>>>>>> future.
>>>>>>>>
>>>>>>>> I also wonder if we can somehow create builds even from previous
>>>>> commits
>>>>>>>> (e.g., for the past few years). Such builds from previous commits
>>>>> don't
>>>>>>>> have to be daily builds, and I think weekly builds (or even monthly
>>>>> builds)
>>>>>>>> would also be very useful.
>>>>>>>>
>>>>>>>> The reason I wish such builds were available is to facilitate
>>>>> debugging and
>>>>>>>> testing. When tested against the TPC-DS benchmark, the current
>> master
>>>>>>>> branch has several correctness problems that were introduced after
>> the
>>>>>>>> release of Hive 3.1.2. We have reported all problems known to us in
>>>>> [1] and
>>>>>>>> also submitted several patches. If such nightly builds had been
>>>>> available,
>>>>>>>> we would have saved quite a bit of time for implementing the patches
>>>>> by
>>>>>>>> quickly finding offending commits that introduced new correctness
>>>>> bugs.
>>>>>>>>
>>>>>>>> In addition, you can find quite a few commits in the master branch
>>>>> that
>>>>>>>> report bugs which are not reproduced in Hive 3.1.2. Examples:
>>>>> HIVE-19990,
>>>>>>>> HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
>>>>>>>> HIVE-22227, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
>>>>>>>> HIVE-25170, HIVE-25864, HIVE-26671.
>>>>>>>> (There may be some errors in this list because we compared against
>>>>> Hive
>>>>>>>> 3.1.2 with many patches backported.) Such nightly builds can be
>>>>> useful for
>>>>>>>> finding root causes of such bugs.
>>>>>>>>
>>>>>>>> Ideally I wish there was an automated procedure to create nightly
>>>>> builds,
>>>>>>>> run TPC-DS benchmark, and report correctness/performance results,
>>>>> although
>>>>>>>> this would be quite hard to implement. (I remember Spark implemented
>>>>> this
>>>>>>>> procedure in the era of Spark 2, but my memory could be wrong.)
>>>>>>>>
>>>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-26654
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, May 23, 2023 at 10:44 AM Ayush Saxena <ay...@gmail.com>
>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Vihang,
>>>>>>>>> +1, We were even exploring publishing the docker images of the
>>>>> snapshot
>>>>>>>>> version as well per commit or maybe weekly, so just shoot 2 docker
>>>>>>>> commands
>>>>>>>>> and you get a Hive cluster running with master code.
>>>>>>>>>
>>>>>>>>> Sai, I think to spin up an env via Docker with all these things
>>>>> should be
>>>>>>>>> doable for sure, but would require someone with real good expertise
>>>>> with
>>>>>>>>> docker as well as setting up these services with Hive. Obviously, I
>>>>> am
>>>>>>>> not
>>>>>>>>> that guy :-)
>>>>>>>>>
>>>>>>>>> @Simhadri has a PR which publishes docker images once a release tag
>>>>> is
>>>>>>>>> pushed, you can explore to have similar stuff for the Snapshot
>>>>> version,
>>>>>>>>> maybe if that sounds cool
>>>>>>>>>
>>>>>>>>> -Ayush
>>>>>>>>>
>>>>>>>>> On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
>>>>>>>>> <sa...@cloudera.com.invalid> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Vihang,
>>>>>>>>>>
>>>>>>>>>> +1 on the idea.
>>>>>>>>>>
>>>>>>>>>> This is a great idea to quickly test if a certain feature is
>>>>> working as
>>>>>>>>>> expected on a certain branch.
>>>>>>>>>> This way we test data loss, correctness, or any other unexpected
>>>>>>>>> scenarios
>>>>>>>>>> that are Hive specific only. However, I'm wondering if it is
>>>>> possible
>>>>>>>> to
>>>>>>>>>> deploy/test in a kerberized environment or issues involving
>>>>>>>> authorization
>>>>>>>>>> services like sentry/ranger.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Sai.
>>>>>>>>>>
>>>>>>>>>> On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
>>>>>>>>> vihangk1@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hello Team,
>>>>>>>>>>>
>>>>>>>>>>> I have observed that it is a common use-case where users would
>> like
>>>>>>>> to
>>>>>>>>>> test
>>>>>>>>>>> out unreleased features/bug fixes either to unblock them or test
>>>>> out
>>>>>>>> if
>>>>>>>>>> the
>>>>>>>>>>> bug fixes really work as intended in their environments. Today in
>>>>> the
>>>>>>>>>> case
>>>>>>>>>>> of Apache Hive, this is not very user friendly because it
>> requires
>>>>>>>> the
>>>>>>>>>> end
>>>>>>>>>>> user to build the binaries directly from the hive source code.
>>>>>>>>>>>
>>>>>>>>>>> I found that Apache Spark has a very useful infrastructure [1]
>>>>> which
>>>>>>>>>>> deploys nightly snapshots [2] [3] from the branch using github
>>>>>>>> actions.
>>>>>>>>>>> This is super useful for any user who wants to try out the latest
>>>>> and
>>>>>>>>>>> greatest using the nightly builds.
>>>>>>>>>>>
>>>>>>>>>>> I was wondering if we should also adopt this. We can use github
>>>>>>>> actions
>>>>>>>>>> to
>>>>>>>>>>> upload the snapshot jars to the public repository (e.g github
>>>>>>>> packages)
>>>>>>>>>> and
>>>>>>>>>>> schedule it as a nightly job.
>>>>>>>>>>>
>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/INFRA-21167
>>>>>>>>>>> [2]
>>>>>>>>>
>> https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
>>>>>>>>>>> [3] https://github.com/apache/spark/pull/30623
>>>>>>>>>>>
>>>>>>>>>>> I can take a stab at this if the community thinks that this is a
>>>>> nice
>>>>>>>>>> thing
>>>>>>>>>>> to have.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Vihang
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>
>>>>
>>>
>>
> 

Re: [DISCUSS] Nightly snaphot builds

Posted by vihang karajgaonkar <vi...@apache.org>.
I just tried the job and it worked as expected. Thanks! If I understand
correctly, the job retains builds for 180 days. Does it mean if there were
no commits to a branch for more than 180 days, we will lose the build
artifacts eventually?

On Thu, May 25, 2023 at 1:50 AM Zoltan Haindrich <ki...@rxd.hu> wrote:

> Hey Vihang,
>
> I've added you as an admin; and I've copied the job as
> http://ci.hive.apache.org/job/hive-nightly-branch-3/
> other option could be to trigger the original job or use
> parameterized-scheduler  but that would configure a real unconditional
> nightly build - which will just build the
> same version over-and-over again if there are no changes...
> ...the current nighly is SCM triggered ; but only once-a-day it makes a
> check which creates the desired results.
>
> the least painfull was to copy the job; I guess no-one touched the
> pipeline script ever since it was introduced :D
>
> cheers,
> Zoltan
>
> On 5/25/23 01:26, vihang karajgaonkar wrote:
> > I created https://issues.apache.org/jira/browse/HIVE-27371 to have
> nightly
> > builds for branch-3. Once that is merged, I think we can have scheduled
> > builds for branch-3 as well. Although, I don't have permissions to
> create a
> > new job for branch-3. Does anyone know how to do it?
> >
> > Thanks,
> > Vihang
> >
> > On Wed, May 24, 2023 at 10:07 AM vihang karajgaonkar <
> vihangk1@apache.org>
> > wrote:
> >
> >> The nightly job http://ci.hive.apache.org/job/hive-nightly/ is great.
> Can
> >> we have this for branch-3 as well since we have been backporting a lot
> of
> >> PRs to branch-3 lately.
> >>
> >> Thanks,
> >> Vihang
> >>
> >>
> >>
> >>
> >>
> >> On Wed, May 24, 2023 at 6:56 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
> >>
> >>> Hey,
> >>>
> >>>   > We already have nightly builds for Hive [1].
> >>>   > [1] http://ci.hive.apache.org/job/hive-nightly/
> >>>
> >>> ...and hive-dev-box can launch such archives; either by using it like
> >>> this:
> >>> https://www.mail-archive.com/dev@hive.apache.org/msg142420.html
> >>>
> >>> or with a somewhat longer command you could launch hdb in bazaar mode;
> >>> and have an HS2 running with a nightly version:
> >>>
> >>> docker run --rm -d -p 10000:10000 -v hive-dev-box_work:/work -e
> >>> HIVE_VERSION=
> >>>
> http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz
> >>> --name hive
> >>> kgyrtkirk/hive-dev-box:bazaar
> >>>
> >>> cheers,
> >>> Zoltan
> >>>
> >>> On 5/24/23 09:15, Stamatis Zampetakis wrote:
> >>>> Hey all,
> >>>>
> >>>> We already have nightly builds for Hive [1].
> >>>>
> >>>> Do we need something more than that?
> >>>>
> >>>> Best,
> >>>> Stamatis
> >>>>
> >>>> [1] http://ci.hive.apache.org/job/hive-nightly/
> >>>>
> >>>>
> >>>> On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar <
> >>> vihangk1@apache.org> wrote:
> >>>>>
> >>>>> I think there are many benefits like others in this thread suggested
> >>> which
> >>>>> can be built on top of nightly builds. Having docker images is great
> >>> but
> >>>>> for now I think we can start simple and publish the jars. Many users
> >>> still
> >>>>> just deploy using jars and it would be useful to them. Once we have a
> >>>>> docker environment we can add a docker image too to the nightly
> builds
> >>> so
> >>>>> that users can choose their preferred way.
> >>>>>
> >>>>> On Mon, May 22, 2023 at 11:07 PM Sungwoo Park <gl...@gmail.com>
> >>> wrote:
> >>>>>
> >>>>>> I think such nightly builds will be useful for testing and debugging
> >>> in the
> >>>>>> future.
> >>>>>>
> >>>>>> I also wonder if we can somehow create builds even from previous
> >>> commits
> >>>>>> (e.g., for the past few years). Such builds from previous commits
> >>> don't
> >>>>>> have to be daily builds, and I think weekly builds (or even monthly
> >>> builds)
> >>>>>> would also be very useful.
> >>>>>>
> >>>>>> The reason I wish such builds were available is to facilitate
> >>> debugging and
> >>>>>> testing. When tested against the TPC-DS benchmark, the current
> master
> >>>>>> branch has several correctness problems that were introduced after
> the
> >>>>>> release of Hive 3.1.2. We have reported all problems known to us in
> >>> [1] and
> >>>>>> also submitted several patches. If such nightly builds had been
> >>> available,
> >>>>>> we would have saved quite a bit of time for implementing the patches
> >>> by
> >>>>>> quickly finding offending commits that introduced new correctness
> >>> bugs.
> >>>>>>
> >>>>>> In addition, you can find quite a few commits in the master branch
> >>> that
> >>>>>> report bugs which are not reproduced in Hive 3.1.2. Examples:
> >>> HIVE-19990,
> >>>>>> HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
> >>>>>> HIVE-22227, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
> >>>>>> HIVE-25170, HIVE-25864, HIVE-26671.
> >>>>>> (There may be some errors in this list because we compared against
> >>> Hive
> >>>>>> 3.1.2 with many patches backported.) Such nightly builds can be
> >>> useful for
> >>>>>> finding root causes of such bugs.
> >>>>>>
> >>>>>> Ideally I wish there was an automated procedure to create nightly
> >>> builds,
> >>>>>> run TPC-DS benchmark, and report correctness/performance results,
> >>> although
> >>>>>> this would be quite hard to implement. (I remember Spark implemented
> >>> this
> >>>>>> procedure in the era of Spark 2, but my memory could be wrong.)
> >>>>>>
> >>>>>> [1] https://issues.apache.org/jira/browse/HIVE-26654
> >>>>>>
> >>>>>>
> >>>>>> On Tue, May 23, 2023 at 10:44 AM Ayush Saxena <ay...@gmail.com>
> >>> wrote:
> >>>>>>
> >>>>>>> Hi Vihang,
> >>>>>>> +1, We were even exploring publishing the docker images of the
> >>> snapshot
> >>>>>>> version as well per commit or maybe weekly, so just shoot 2 docker
> >>>>>> commands
> >>>>>>> and you get a Hive cluster running with master code.
> >>>>>>>
> >>>>>>> Sai, I think to spin up an env via Docker with all these things
> >>> should be
> >>>>>>> doable for sure, but would require someone with real good expertise
> >>> with
> >>>>>>> docker as well as setting up these services with Hive. Obviously, I
> >>> am
> >>>>>> not
> >>>>>>> that guy :-)
> >>>>>>>
> >>>>>>> @Simhadri has a PR which publishes docker images once a release tag
> >>> is
> >>>>>>> pushed, you can explore to have similar stuff for the Snapshot
> >>> version,
> >>>>>>> maybe if that sounds cool
> >>>>>>>
> >>>>>>> -Ayush
> >>>>>>>
> >>>>>>> On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
> >>>>>>> <sa...@cloudera.com.invalid> wrote:
> >>>>>>>
> >>>>>>>> Hi Vihang,
> >>>>>>>>
> >>>>>>>> +1 on the idea.
> >>>>>>>>
> >>>>>>>> This is a great idea to quickly test if a certain feature is
> >>> working as
> >>>>>>>> expected on a certain branch.
> >>>>>>>> This way we test data loss, correctness, or any other unexpected
> >>>>>>> scenarios
> >>>>>>>> that are Hive specific only. However, I'm wondering if it is
> >>> possible
> >>>>>> to
> >>>>>>>> deploy/test in a kerberized environment or issues involving
> >>>>>> authorization
> >>>>>>>> services like sentry/ranger.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Sai.
> >>>>>>>>
> >>>>>>>> On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
> >>>>>>> vihangk1@apache.org>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hello Team,
> >>>>>>>>>
> >>>>>>>>> I have observed that it is a common use-case where users would
> like
> >>>>>> to
> >>>>>>>> test
> >>>>>>>>> out unreleased features/bug fixes either to unblock them or test
> >>> out
> >>>>>> if
> >>>>>>>> the
> >>>>>>>>> bug fixes really work as intended in their environments. Today in
> >>> the
> >>>>>>>> case
> >>>>>>>>> of Apache Hive, this is not very user friendly because it
> requires
> >>>>>> the
> >>>>>>>> end
> >>>>>>>>> user to build the binaries directly from the hive source code.
> >>>>>>>>>
> >>>>>>>>> I found that Apache Spark has a very useful infrastructure [1]
> >>> which
> >>>>>>>>> deploys nightly snapshots [2] [3] from the branch using github
> >>>>>> actions.
> >>>>>>>>> This is super useful for any user who wants to try out the latest
> >>> and
> >>>>>>>>> greatest using the nightly builds.
> >>>>>>>>>
> >>>>>>>>> I was wondering if we should also adopt this. We can use github
> >>>>>> actions
> >>>>>>>> to
> >>>>>>>>> upload the snapshot jars to the public repository (e.g github
> >>>>>> packages)
> >>>>>>>> and
> >>>>>>>>> schedule it as a nightly job.
> >>>>>>>>>
> >>>>>>>>> [1] https://issues.apache.org/jira/browse/INFRA-21167
> >>>>>>>>> [2]
> >>>>>>>
> https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
> >>>>>>>>> [3] https://github.com/apache/spark/pull/30623
> >>>>>>>>>
> >>>>>>>>> I can take a stab at this if the community thinks that this is a
> >>> nice
> >>>>>>>> thing
> >>>>>>>>> to have.
> >>>>>>>>>
> >>>>>>>>> Thanks,
> >>>>>>>>> Vihang
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>
> >>
> >
>

Re: [DISCUSS] Nightly snaphot builds

Posted by Zoltan Haindrich <ki...@rxd.hu>.
Hey Vihang,

I've added you as an admin; and I've copied the job as http://ci.hive.apache.org/job/hive-nightly-branch-3/
other option could be to trigger the original job or use parameterized-scheduler  but that would configure a real unconditional nightly build - which will just build the 
same version over-and-over again if there are no changes...
...the current nighly is SCM triggered ; but only once-a-day it makes a check which creates the desired results.

the least painfull was to copy the job; I guess no-one touched the pipeline script ever since it was introduced :D

cheers,
Zoltan

On 5/25/23 01:26, vihang karajgaonkar wrote:
> I created https://issues.apache.org/jira/browse/HIVE-27371 to have nightly
> builds for branch-3. Once that is merged, I think we can have scheduled
> builds for branch-3 as well. Although, I don't have permissions to create a
> new job for branch-3. Does anyone know how to do it?
> 
> Thanks,
> Vihang
> 
> On Wed, May 24, 2023 at 10:07 AM vihang karajgaonkar <vi...@apache.org>
> wrote:
> 
>> The nightly job http://ci.hive.apache.org/job/hive-nightly/ is great. Can
>> we have this for branch-3 as well since we have been backporting a lot of
>> PRs to branch-3 lately.
>>
>> Thanks,
>> Vihang
>>
>>
>>
>>
>>
>> On Wed, May 24, 2023 at 6:56 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
>>
>>> Hey,
>>>
>>>   > We already have nightly builds for Hive [1].
>>>   > [1] http://ci.hive.apache.org/job/hive-nightly/
>>>
>>> ...and hive-dev-box can launch such archives; either by using it like
>>> this:
>>> https://www.mail-archive.com/dev@hive.apache.org/msg142420.html
>>>
>>> or with a somewhat longer command you could launch hdb in bazaar mode;
>>> and have an HS2 running with a nightly version:
>>>
>>> docker run --rm -d -p 10000:10000 -v hive-dev-box_work:/work -e
>>> HIVE_VERSION=
>>> http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz
>>> --name hive
>>> kgyrtkirk/hive-dev-box:bazaar
>>>
>>> cheers,
>>> Zoltan
>>>
>>> On 5/24/23 09:15, Stamatis Zampetakis wrote:
>>>> Hey all,
>>>>
>>>> We already have nightly builds for Hive [1].
>>>>
>>>> Do we need something more than that?
>>>>
>>>> Best,
>>>> Stamatis
>>>>
>>>> [1] http://ci.hive.apache.org/job/hive-nightly/
>>>>
>>>>
>>>> On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar <
>>> vihangk1@apache.org> wrote:
>>>>>
>>>>> I think there are many benefits like others in this thread suggested
>>> which
>>>>> can be built on top of nightly builds. Having docker images is great
>>> but
>>>>> for now I think we can start simple and publish the jars. Many users
>>> still
>>>>> just deploy using jars and it would be useful to them. Once we have a
>>>>> docker environment we can add a docker image too to the nightly builds
>>> so
>>>>> that users can choose their preferred way.
>>>>>
>>>>> On Mon, May 22, 2023 at 11:07 PM Sungwoo Park <gl...@gmail.com>
>>> wrote:
>>>>>
>>>>>> I think such nightly builds will be useful for testing and debugging
>>> in the
>>>>>> future.
>>>>>>
>>>>>> I also wonder if we can somehow create builds even from previous
>>> commits
>>>>>> (e.g., for the past few years). Such builds from previous commits
>>> don't
>>>>>> have to be daily builds, and I think weekly builds (or even monthly
>>> builds)
>>>>>> would also be very useful.
>>>>>>
>>>>>> The reason I wish such builds were available is to facilitate
>>> debugging and
>>>>>> testing. When tested against the TPC-DS benchmark, the current master
>>>>>> branch has several correctness problems that were introduced after the
>>>>>> release of Hive 3.1.2. We have reported all problems known to us in
>>> [1] and
>>>>>> also submitted several patches. If such nightly builds had been
>>> available,
>>>>>> we would have saved quite a bit of time for implementing the patches
>>> by
>>>>>> quickly finding offending commits that introduced new correctness
>>> bugs.
>>>>>>
>>>>>> In addition, you can find quite a few commits in the master branch
>>> that
>>>>>> report bugs which are not reproduced in Hive 3.1.2. Examples:
>>> HIVE-19990,
>>>>>> HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
>>>>>> HIVE-22227, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
>>>>>> HIVE-25170, HIVE-25864, HIVE-26671.
>>>>>> (There may be some errors in this list because we compared against
>>> Hive
>>>>>> 3.1.2 with many patches backported.) Such nightly builds can be
>>> useful for
>>>>>> finding root causes of such bugs.
>>>>>>
>>>>>> Ideally I wish there was an automated procedure to create nightly
>>> builds,
>>>>>> run TPC-DS benchmark, and report correctness/performance results,
>>> although
>>>>>> this would be quite hard to implement. (I remember Spark implemented
>>> this
>>>>>> procedure in the era of Spark 2, but my memory could be wrong.)
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/HIVE-26654
>>>>>>
>>>>>>
>>>>>> On Tue, May 23, 2023 at 10:44 AM Ayush Saxena <ay...@gmail.com>
>>> wrote:
>>>>>>
>>>>>>> Hi Vihang,
>>>>>>> +1, We were even exploring publishing the docker images of the
>>> snapshot
>>>>>>> version as well per commit or maybe weekly, so just shoot 2 docker
>>>>>> commands
>>>>>>> and you get a Hive cluster running with master code.
>>>>>>>
>>>>>>> Sai, I think to spin up an env via Docker with all these things
>>> should be
>>>>>>> doable for sure, but would require someone with real good expertise
>>> with
>>>>>>> docker as well as setting up these services with Hive. Obviously, I
>>> am
>>>>>> not
>>>>>>> that guy :-)
>>>>>>>
>>>>>>> @Simhadri has a PR which publishes docker images once a release tag
>>> is
>>>>>>> pushed, you can explore to have similar stuff for the Snapshot
>>> version,
>>>>>>> maybe if that sounds cool
>>>>>>>
>>>>>>> -Ayush
>>>>>>>
>>>>>>> On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
>>>>>>> <sa...@cloudera.com.invalid> wrote:
>>>>>>>
>>>>>>>> Hi Vihang,
>>>>>>>>
>>>>>>>> +1 on the idea.
>>>>>>>>
>>>>>>>> This is a great idea to quickly test if a certain feature is
>>> working as
>>>>>>>> expected on a certain branch.
>>>>>>>> This way we test data loss, correctness, or any other unexpected
>>>>>>> scenarios
>>>>>>>> that are Hive specific only. However, I'm wondering if it is
>>> possible
>>>>>> to
>>>>>>>> deploy/test in a kerberized environment or issues involving
>>>>>> authorization
>>>>>>>> services like sentry/ranger.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Sai.
>>>>>>>>
>>>>>>>> On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
>>>>>>> vihangk1@apache.org>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello Team,
>>>>>>>>>
>>>>>>>>> I have observed that it is a common use-case where users would like
>>>>>> to
>>>>>>>> test
>>>>>>>>> out unreleased features/bug fixes either to unblock them or test
>>> out
>>>>>> if
>>>>>>>> the
>>>>>>>>> bug fixes really work as intended in their environments. Today in
>>> the
>>>>>>>> case
>>>>>>>>> of Apache Hive, this is not very user friendly because it requires
>>>>>> the
>>>>>>>> end
>>>>>>>>> user to build the binaries directly from the hive source code.
>>>>>>>>>
>>>>>>>>> I found that Apache Spark has a very useful infrastructure [1]
>>> which
>>>>>>>>> deploys nightly snapshots [2] [3] from the branch using github
>>>>>> actions.
>>>>>>>>> This is super useful for any user who wants to try out the latest
>>> and
>>>>>>>>> greatest using the nightly builds.
>>>>>>>>>
>>>>>>>>> I was wondering if we should also adopt this. We can use github
>>>>>> actions
>>>>>>>> to
>>>>>>>>> upload the snapshot jars to the public repository (e.g github
>>>>>> packages)
>>>>>>>> and
>>>>>>>>> schedule it as a nightly job.
>>>>>>>>>
>>>>>>>>> [1] https://issues.apache.org/jira/browse/INFRA-21167
>>>>>>>>> [2]
>>>>>>> https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
>>>>>>>>> [3] https://github.com/apache/spark/pull/30623
>>>>>>>>>
>>>>>>>>> I can take a stab at this if the community thinks that this is a
>>> nice
>>>>>>>> thing
>>>>>>>>> to have.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Vihang
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>
>>
> 

Re: [DISCUSS] Nightly snaphot builds

Posted by vihang karajgaonkar <vi...@apache.org>.
I created https://issues.apache.org/jira/browse/HIVE-27371 to have nightly
builds for branch-3. Once that is merged, I think we can have scheduled
builds for branch-3 as well. Although, I don't have permissions to create a
new job for branch-3. Does anyone know how to do it?

Thanks,
Vihang

On Wed, May 24, 2023 at 10:07 AM vihang karajgaonkar <vi...@apache.org>
wrote:

> The nightly job http://ci.hive.apache.org/job/hive-nightly/ is great. Can
> we have this for branch-3 as well since we have been backporting a lot of
> PRs to branch-3 lately.
>
> Thanks,
> Vihang
>
>
>
>
>
> On Wed, May 24, 2023 at 6:56 AM Zoltan Haindrich <ki...@rxd.hu> wrote:
>
>> Hey,
>>
>>  > We already have nightly builds for Hive [1].
>>  > [1] http://ci.hive.apache.org/job/hive-nightly/
>>
>> ...and hive-dev-box can launch such archives; either by using it like
>> this:
>> https://www.mail-archive.com/dev@hive.apache.org/msg142420.html
>>
>> or with a somewhat longer command you could launch hdb in bazaar mode;
>> and have an HS2 running with a nightly version:
>>
>> docker run --rm -d -p 10000:10000 -v hive-dev-box_work:/work -e
>> HIVE_VERSION=
>> http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz
>> --name hive
>> kgyrtkirk/hive-dev-box:bazaar
>>
>> cheers,
>> Zoltan
>>
>> On 5/24/23 09:15, Stamatis Zampetakis wrote:
>> > Hey all,
>> >
>> > We already have nightly builds for Hive [1].
>> >
>> > Do we need something more than that?
>> >
>> > Best,
>> > Stamatis
>> >
>> > [1] http://ci.hive.apache.org/job/hive-nightly/
>> >
>> >
>> > On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar <
>> vihangk1@apache.org> wrote:
>> >>
>> >> I think there are many benefits like others in this thread suggested
>> which
>> >> can be built on top of nightly builds. Having docker images is great
>> but
>> >> for now I think we can start simple and publish the jars. Many users
>> still
>> >> just deploy using jars and it would be useful to them. Once we have a
>> >> docker environment we can add a docker image too to the nightly builds
>> so
>> >> that users can choose their preferred way.
>> >>
>> >> On Mon, May 22, 2023 at 11:07 PM Sungwoo Park <gl...@gmail.com>
>> wrote:
>> >>
>> >>> I think such nightly builds will be useful for testing and debugging
>> in the
>> >>> future.
>> >>>
>> >>> I also wonder if we can somehow create builds even from previous
>> commits
>> >>> (e.g., for the past few years). Such builds from previous commits
>> don't
>> >>> have to be daily builds, and I think weekly builds (or even monthly
>> builds)
>> >>> would also be very useful.
>> >>>
>> >>> The reason I wish such builds were available is to facilitate
>> debugging and
>> >>> testing. When tested against the TPC-DS benchmark, the current master
>> >>> branch has several correctness problems that were introduced after the
>> >>> release of Hive 3.1.2. We have reported all problems known to us in
>> [1] and
>> >>> also submitted several patches. If such nightly builds had been
>> available,
>> >>> we would have saved quite a bit of time for implementing the patches
>> by
>> >>> quickly finding offending commits that introduced new correctness
>> bugs.
>> >>>
>> >>> In addition, you can find quite a few commits in the master branch
>> that
>> >>> report bugs which are not reproduced in Hive 3.1.2. Examples:
>> HIVE-19990,
>> >>> HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
>> >>> HIVE-22227, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
>> >>> HIVE-25170, HIVE-25864, HIVE-26671.
>> >>> (There may be some errors in this list because we compared against
>> Hive
>> >>> 3.1.2 with many patches backported.) Such nightly builds can be
>> useful for
>> >>> finding root causes of such bugs.
>> >>>
>> >>> Ideally I wish there was an automated procedure to create nightly
>> builds,
>> >>> run TPC-DS benchmark, and report correctness/performance results,
>> although
>> >>> this would be quite hard to implement. (I remember Spark implemented
>> this
>> >>> procedure in the era of Spark 2, but my memory could be wrong.)
>> >>>
>> >>> [1] https://issues.apache.org/jira/browse/HIVE-26654
>> >>>
>> >>>
>> >>> On Tue, May 23, 2023 at 10:44 AM Ayush Saxena <ay...@gmail.com>
>> wrote:
>> >>>
>> >>>> Hi Vihang,
>> >>>> +1, We were even exploring publishing the docker images of the
>> snapshot
>> >>>> version as well per commit or maybe weekly, so just shoot 2 docker
>> >>> commands
>> >>>> and you get a Hive cluster running with master code.
>> >>>>
>> >>>> Sai, I think to spin up an env via Docker with all these things
>> should be
>> >>>> doable for sure, but would require someone with real good expertise
>> with
>> >>>> docker as well as setting up these services with Hive. Obviously, I
>> am
>> >>> not
>> >>>> that guy :-)
>> >>>>
>> >>>> @Simhadri has a PR which publishes docker images once a release tag
>> is
>> >>>> pushed, you can explore to have similar stuff for the Snapshot
>> version,
>> >>>> maybe if that sounds cool
>> >>>>
>> >>>> -Ayush
>> >>>>
>> >>>> On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
>> >>>> <sa...@cloudera.com.invalid> wrote:
>> >>>>
>> >>>>> Hi Vihang,
>> >>>>>
>> >>>>> +1 on the idea.
>> >>>>>
>> >>>>> This is a great idea to quickly test if a certain feature is
>> working as
>> >>>>> expected on a certain branch.
>> >>>>> This way we test data loss, correctness, or any other unexpected
>> >>>> scenarios
>> >>>>> that are Hive specific only. However, I'm wondering if it is
>> possible
>> >>> to
>> >>>>> deploy/test in a kerberized environment or issues involving
>> >>> authorization
>> >>>>> services like sentry/ranger.
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Sai.
>> >>>>>
>> >>>>> On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
>> >>>> vihangk1@apache.org>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Hello Team,
>> >>>>>>
>> >>>>>> I have observed that it is a common use-case where users would like
>> >>> to
>> >>>>> test
>> >>>>>> out unreleased features/bug fixes either to unblock them or test
>> out
>> >>> if
>> >>>>> the
>> >>>>>> bug fixes really work as intended in their environments. Today in
>> the
>> >>>>> case
>> >>>>>> of Apache Hive, this is not very user friendly because it requires
>> >>> the
>> >>>>> end
>> >>>>>> user to build the binaries directly from the hive source code.
>> >>>>>>
>> >>>>>> I found that Apache Spark has a very useful infrastructure [1]
>> which
>> >>>>>> deploys nightly snapshots [2] [3] from the branch using github
>> >>> actions.
>> >>>>>> This is super useful for any user who wants to try out the latest
>> and
>> >>>>>> greatest using the nightly builds.
>> >>>>>>
>> >>>>>> I was wondering if we should also adopt this. We can use github
>> >>> actions
>> >>>>> to
>> >>>>>> upload the snapshot jars to the public repository (e.g github
>> >>> packages)
>> >>>>> and
>> >>>>>> schedule it as a nightly job.
>> >>>>>>
>> >>>>>> [1] https://issues.apache.org/jira/browse/INFRA-21167
>> >>>>>> [2]
>> >>>> https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
>> >>>>>> [3] https://github.com/apache/spark/pull/30623
>> >>>>>>
>> >>>>>> I can take a stab at this if the community thinks that this is a
>> nice
>> >>>>> thing
>> >>>>>> to have.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Vihang
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>>
>

Re: [DISCUSS] Nightly snaphot builds

Posted by vihang karajgaonkar <vi...@apache.org>.
The nightly job http://ci.hive.apache.org/job/hive-nightly/ is great. Can
we have this for branch-3 as well since we have been backporting a lot of
PRs to branch-3 lately.

Thanks,
Vihang





On Wed, May 24, 2023 at 6:56 AM Zoltan Haindrich <ki...@rxd.hu> wrote:

> Hey,
>
>  > We already have nightly builds for Hive [1].
>  > [1] http://ci.hive.apache.org/job/hive-nightly/
>
> ...and hive-dev-box can launch such archives; either by using it like this:
> https://www.mail-archive.com/dev@hive.apache.org/msg142420.html
>
> or with a somewhat longer command you could launch hdb in bazaar mode; and
> have an HS2 running with a nightly version:
>
> docker run --rm -d -p 10000:10000 -v hive-dev-box_work:/work -e
> HIVE_VERSION=
> http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz
> --name hive
> kgyrtkirk/hive-dev-box:bazaar
>
> cheers,
> Zoltan
>
> On 5/24/23 09:15, Stamatis Zampetakis wrote:
> > Hey all,
> >
> > We already have nightly builds for Hive [1].
> >
> > Do we need something more than that?
> >
> > Best,
> > Stamatis
> >
> > [1] http://ci.hive.apache.org/job/hive-nightly/
> >
> >
> > On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar <vi...@apache.org>
> wrote:
> >>
> >> I think there are many benefits like others in this thread suggested
> which
> >> can be built on top of nightly builds. Having docker images is great but
> >> for now I think we can start simple and publish the jars. Many users
> still
> >> just deploy using jars and it would be useful to them. Once we have a
> >> docker environment we can add a docker image too to the nightly builds
> so
> >> that users can choose their preferred way.
> >>
> >> On Mon, May 22, 2023 at 11:07 PM Sungwoo Park <gl...@gmail.com>
> wrote:
> >>
> >>> I think such nightly builds will be useful for testing and debugging
> in the
> >>> future.
> >>>
> >>> I also wonder if we can somehow create builds even from previous
> commits
> >>> (e.g., for the past few years). Such builds from previous commits don't
> >>> have to be daily builds, and I think weekly builds (or even monthly
> builds)
> >>> would also be very useful.
> >>>
> >>> The reason I wish such builds were available is to facilitate
> debugging and
> >>> testing. When tested against the TPC-DS benchmark, the current master
> >>> branch has several correctness problems that were introduced after the
> >>> release of Hive 3.1.2. We have reported all problems known to us in
> [1] and
> >>> also submitted several patches. If such nightly builds had been
> available,
> >>> we would have saved quite a bit of time for implementing the patches by
> >>> quickly finding offending commits that introduced new correctness bugs.
> >>>
> >>> In addition, you can find quite a few commits in the master branch that
> >>> report bugs which are not reproduced in Hive 3.1.2. Examples:
> HIVE-19990,
> >>> HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
> >>> HIVE-22227, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
> >>> HIVE-25170, HIVE-25864, HIVE-26671.
> >>> (There may be some errors in this list because we compared against Hive
> >>> 3.1.2 with many patches backported.) Such nightly builds can be useful
> for
> >>> finding root causes of such bugs.
> >>>
> >>> Ideally I wish there was an automated procedure to create nightly
> builds,
> >>> run TPC-DS benchmark, and report correctness/performance results,
> although
> >>> this would be quite hard to implement. (I remember Spark implemented
> this
> >>> procedure in the era of Spark 2, but my memory could be wrong.)
> >>>
> >>> [1] https://issues.apache.org/jira/browse/HIVE-26654
> >>>
> >>>
> >>> On Tue, May 23, 2023 at 10:44 AM Ayush Saxena <ay...@gmail.com>
> wrote:
> >>>
> >>>> Hi Vihang,
> >>>> +1, We were even exploring publishing the docker images of the
> snapshot
> >>>> version as well per commit or maybe weekly, so just shoot 2 docker
> >>> commands
> >>>> and you get a Hive cluster running with master code.
> >>>>
> >>>> Sai, I think to spin up an env via Docker with all these things
> should be
> >>>> doable for sure, but would require someone with real good expertise
> with
> >>>> docker as well as setting up these services with Hive. Obviously, I am
> >>> not
> >>>> that guy :-)
> >>>>
> >>>> @Simhadri has a PR which publishes docker images once a release tag is
> >>>> pushed, you can explore to have similar stuff for the Snapshot
> version,
> >>>> maybe if that sounds cool
> >>>>
> >>>> -Ayush
> >>>>
> >>>> On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
> >>>> <sa...@cloudera.com.invalid> wrote:
> >>>>
> >>>>> Hi Vihang,
> >>>>>
> >>>>> +1 on the idea.
> >>>>>
> >>>>> This is a great idea to quickly test if a certain feature is working
> as
> >>>>> expected on a certain branch.
> >>>>> This way we test data loss, correctness, or any other unexpected
> >>>> scenarios
> >>>>> that are Hive specific only. However, I'm wondering if it is possible
> >>> to
> >>>>> deploy/test in a kerberized environment or issues involving
> >>> authorization
> >>>>> services like sentry/ranger.
> >>>>>
> >>>>> Thanks,
> >>>>> Sai.
> >>>>>
> >>>>> On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
> >>>> vihangk1@apache.org>
> >>>>> wrote:
> >>>>>
> >>>>>> Hello Team,
> >>>>>>
> >>>>>> I have observed that it is a common use-case where users would like
> >>> to
> >>>>> test
> >>>>>> out unreleased features/bug fixes either to unblock them or test out
> >>> if
> >>>>> the
> >>>>>> bug fixes really work as intended in their environments. Today in
> the
> >>>>> case
> >>>>>> of Apache Hive, this is not very user friendly because it requires
> >>> the
> >>>>> end
> >>>>>> user to build the binaries directly from the hive source code.
> >>>>>>
> >>>>>> I found that Apache Spark has a very useful infrastructure [1] which
> >>>>>> deploys nightly snapshots [2] [3] from the branch using github
> >>> actions.
> >>>>>> This is super useful for any user who wants to try out the latest
> and
> >>>>>> greatest using the nightly builds.
> >>>>>>
> >>>>>> I was wondering if we should also adopt this. We can use github
> >>> actions
> >>>>> to
> >>>>>> upload the snapshot jars to the public repository (e.g github
> >>> packages)
> >>>>> and
> >>>>>> schedule it as a nightly job.
> >>>>>>
> >>>>>> [1] https://issues.apache.org/jira/browse/INFRA-21167
> >>>>>> [2]
> >>>> https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
> >>>>>> [3] https://github.com/apache/spark/pull/30623
> >>>>>>
> >>>>>> I can take a stab at this if the community thinks that this is a
> nice
> >>>>> thing
> >>>>>> to have.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Vihang
> >>>>>>
> >>>>>
> >>>>
> >>>
>

Re: [DISCUSS] Nightly snaphot builds

Posted by Zoltan Haindrich <ki...@rxd.hu>.
Hey,

 > We already have nightly builds for Hive [1].
 > [1] http://ci.hive.apache.org/job/hive-nightly/

...and hive-dev-box can launch such archives; either by using it like this:
https://www.mail-archive.com/dev@hive.apache.org/msg142420.html

or with a somewhat longer command you could launch hdb in bazaar mode; and have an HS2 running with a nightly version:

docker run --rm -d -p 10000:10000 -v hive-dev-box_work:/work -e 
HIVE_VERSION=http://ci.hive.apache.org/job/hive-nightly/lastSuccessfulBuild/artifact/archive/apache-hive-4.0.0-nightly-b0b3fde70c-20230524_014711-bin.tar.gz --name hive 
kgyrtkirk/hive-dev-box:bazaar

cheers,
Zoltan

On 5/24/23 09:15, Stamatis Zampetakis wrote:
> Hey all,
> 
> We already have nightly builds for Hive [1].
> 
> Do we need something more than that?
> 
> Best,
> Stamatis
> 
> [1] http://ci.hive.apache.org/job/hive-nightly/
> 
> 
> On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar <vi...@apache.org> wrote:
>>
>> I think there are many benefits like others in this thread suggested which
>> can be built on top of nightly builds. Having docker images is great but
>> for now I think we can start simple and publish the jars. Many users still
>> just deploy using jars and it would be useful to them. Once we have a
>> docker environment we can add a docker image too to the nightly builds so
>> that users can choose their preferred way.
>>
>> On Mon, May 22, 2023 at 11:07 PM Sungwoo Park <gl...@gmail.com> wrote:
>>
>>> I think such nightly builds will be useful for testing and debugging in the
>>> future.
>>>
>>> I also wonder if we can somehow create builds even from previous commits
>>> (e.g., for the past few years). Such builds from previous commits don't
>>> have to be daily builds, and I think weekly builds (or even monthly builds)
>>> would also be very useful.
>>>
>>> The reason I wish such builds were available is to facilitate debugging and
>>> testing. When tested against the TPC-DS benchmark, the current master
>>> branch has several correctness problems that were introduced after the
>>> release of Hive 3.1.2. We have reported all problems known to us in [1] and
>>> also submitted several patches. If such nightly builds had been available,
>>> we would have saved quite a bit of time for implementing the patches by
>>> quickly finding offending commits that introduced new correctness bugs.
>>>
>>> In addition, you can find quite a few commits in the master branch that
>>> report bugs which are not reproduced in Hive 3.1.2. Examples: HIVE-19990,
>>> HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
>>> HIVE-22227, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
>>> HIVE-25170, HIVE-25864, HIVE-26671.
>>> (There may be some errors in this list because we compared against Hive
>>> 3.1.2 with many patches backported.) Such nightly builds can be useful for
>>> finding root causes of such bugs.
>>>
>>> Ideally I wish there was an automated procedure to create nightly builds,
>>> run TPC-DS benchmark, and report correctness/performance results, although
>>> this would be quite hard to implement. (I remember Spark implemented this
>>> procedure in the era of Spark 2, but my memory could be wrong.)
>>>
>>> [1] https://issues.apache.org/jira/browse/HIVE-26654
>>>
>>>
>>> On Tue, May 23, 2023 at 10:44 AM Ayush Saxena <ay...@gmail.com> wrote:
>>>
>>>> Hi Vihang,
>>>> +1, We were even exploring publishing the docker images of the snapshot
>>>> version as well per commit or maybe weekly, so just shoot 2 docker
>>> commands
>>>> and you get a Hive cluster running with master code.
>>>>
>>>> Sai, I think to spin up an env via Docker with all these things should be
>>>> doable for sure, but would require someone with real good expertise with
>>>> docker as well as setting up these services with Hive. Obviously, I am
>>> not
>>>> that guy :-)
>>>>
>>>> @Simhadri has a PR which publishes docker images once a release tag is
>>>> pushed, you can explore to have similar stuff for the Snapshot version,
>>>> maybe if that sounds cool
>>>>
>>>> -Ayush
>>>>
>>>> On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
>>>> <sa...@cloudera.com.invalid> wrote:
>>>>
>>>>> Hi Vihang,
>>>>>
>>>>> +1 on the idea.
>>>>>
>>>>> This is a great idea to quickly test if a certain feature is working as
>>>>> expected on a certain branch.
>>>>> This way we test data loss, correctness, or any other unexpected
>>>> scenarios
>>>>> that are Hive specific only. However, I'm wondering if it is possible
>>> to
>>>>> deploy/test in a kerberized environment or issues involving
>>> authorization
>>>>> services like sentry/ranger.
>>>>>
>>>>> Thanks,
>>>>> Sai.
>>>>>
>>>>> On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
>>>> vihangk1@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hello Team,
>>>>>>
>>>>>> I have observed that it is a common use-case where users would like
>>> to
>>>>> test
>>>>>> out unreleased features/bug fixes either to unblock them or test out
>>> if
>>>>> the
>>>>>> bug fixes really work as intended in their environments. Today in the
>>>>> case
>>>>>> of Apache Hive, this is not very user friendly because it requires
>>> the
>>>>> end
>>>>>> user to build the binaries directly from the hive source code.
>>>>>>
>>>>>> I found that Apache Spark has a very useful infrastructure [1] which
>>>>>> deploys nightly snapshots [2] [3] from the branch using github
>>> actions.
>>>>>> This is super useful for any user who wants to try out the latest and
>>>>>> greatest using the nightly builds.
>>>>>>
>>>>>> I was wondering if we should also adopt this. We can use github
>>> actions
>>>>> to
>>>>>> upload the snapshot jars to the public repository (e.g github
>>> packages)
>>>>> and
>>>>>> schedule it as a nightly job.
>>>>>>
>>>>>> [1] https://issues.apache.org/jira/browse/INFRA-21167
>>>>>> [2]
>>>> https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
>>>>>> [3] https://github.com/apache/spark/pull/30623
>>>>>>
>>>>>> I can take a stab at this if the community thinks that this is a nice
>>>>> thing
>>>>>> to have.
>>>>>>
>>>>>> Thanks,
>>>>>> Vihang
>>>>>>
>>>>>
>>>>
>>>

Re: [DISCUSS] Nightly snaphot builds

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hey all,

We already have nightly builds for Hive [1].

Do we need something more than that?

Best,
Stamatis

[1] http://ci.hive.apache.org/job/hive-nightly/


On Tue, May 23, 2023 at 9:03 AM vihang karajgaonkar <vi...@apache.org> wrote:
>
> I think there are many benefits like others in this thread suggested which
> can be built on top of nightly builds. Having docker images is great but
> for now I think we can start simple and publish the jars. Many users still
> just deploy using jars and it would be useful to them. Once we have a
> docker environment we can add a docker image too to the nightly builds so
> that users can choose their preferred way.
>
> On Mon, May 22, 2023 at 11:07 PM Sungwoo Park <gl...@gmail.com> wrote:
>
> > I think such nightly builds will be useful for testing and debugging in the
> > future.
> >
> > I also wonder if we can somehow create builds even from previous commits
> > (e.g., for the past few years). Such builds from previous commits don't
> > have to be daily builds, and I think weekly builds (or even monthly builds)
> > would also be very useful.
> >
> > The reason I wish such builds were available is to facilitate debugging and
> > testing. When tested against the TPC-DS benchmark, the current master
> > branch has several correctness problems that were introduced after the
> > release of Hive 3.1.2. We have reported all problems known to us in [1] and
> > also submitted several patches. If such nightly builds had been available,
> > we would have saved quite a bit of time for implementing the patches by
> > quickly finding offending commits that introduced new correctness bugs.
> >
> > In addition, you can find quite a few commits in the master branch that
> > report bugs which are not reproduced in Hive 3.1.2. Examples: HIVE-19990,
> > HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
> > HIVE-22227, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
> > HIVE-25170, HIVE-25864, HIVE-26671.
> > (There may be some errors in this list because we compared against Hive
> > 3.1.2 with many patches backported.) Such nightly builds can be useful for
> > finding root causes of such bugs.
> >
> > Ideally I wish there was an automated procedure to create nightly builds,
> > run TPC-DS benchmark, and report correctness/performance results, although
> > this would be quite hard to implement. (I remember Spark implemented this
> > procedure in the era of Spark 2, but my memory could be wrong.)
> >
> > [1] https://issues.apache.org/jira/browse/HIVE-26654
> >
> >
> > On Tue, May 23, 2023 at 10:44 AM Ayush Saxena <ay...@gmail.com> wrote:
> >
> > > Hi Vihang,
> > > +1, We were even exploring publishing the docker images of the snapshot
> > > version as well per commit or maybe weekly, so just shoot 2 docker
> > commands
> > > and you get a Hive cluster running with master code.
> > >
> > > Sai, I think to spin up an env via Docker with all these things should be
> > > doable for sure, but would require someone with real good expertise with
> > > docker as well as setting up these services with Hive. Obviously, I am
> > not
> > > that guy :-)
> > >
> > > @Simhadri has a PR which publishes docker images once a release tag is
> > > pushed, you can explore to have similar stuff for the Snapshot version,
> > > maybe if that sounds cool
> > >
> > > -Ayush
> > >
> > > On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
> > > <sa...@cloudera.com.invalid> wrote:
> > >
> > > > Hi Vihang,
> > > >
> > > > +1 on the idea.
> > > >
> > > > This is a great idea to quickly test if a certain feature is working as
> > > > expected on a certain branch.
> > > > This way we test data loss, correctness, or any other unexpected
> > > scenarios
> > > > that are Hive specific only. However, I'm wondering if it is possible
> > to
> > > > deploy/test in a kerberized environment or issues involving
> > authorization
> > > > services like sentry/ranger.
> > > >
> > > > Thanks,
> > > > Sai.
> > > >
> > > > On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
> > > vihangk1@apache.org>
> > > > wrote:
> > > >
> > > > > Hello Team,
> > > > >
> > > > > I have observed that it is a common use-case where users would like
> > to
> > > > test
> > > > > out unreleased features/bug fixes either to unblock them or test out
> > if
> > > > the
> > > > > bug fixes really work as intended in their environments. Today in the
> > > > case
> > > > > of Apache Hive, this is not very user friendly because it requires
> > the
> > > > end
> > > > > user to build the binaries directly from the hive source code.
> > > > >
> > > > > I found that Apache Spark has a very useful infrastructure [1] which
> > > > > deploys nightly snapshots [2] [3] from the branch using github
> > actions.
> > > > > This is super useful for any user who wants to try out the latest and
> > > > > greatest using the nightly builds.
> > > > >
> > > > > I was wondering if we should also adopt this. We can use github
> > actions
> > > > to
> > > > > upload the snapshot jars to the public repository (e.g github
> > packages)
> > > > and
> > > > > schedule it as a nightly job.
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/INFRA-21167
> > > > > [2]
> > > https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
> > > > > [3] https://github.com/apache/spark/pull/30623
> > > > >
> > > > > I can take a stab at this if the community thinks that this is a nice
> > > > thing
> > > > > to have.
> > > > >
> > > > > Thanks,
> > > > > Vihang
> > > > >
> > > >
> > >
> >

Re: [DISCUSS] Nightly snaphot builds

Posted by vihang karajgaonkar <vi...@apache.org>.
I think there are many benefits like others in this thread suggested which
can be built on top of nightly builds. Having docker images is great but
for now I think we can start simple and publish the jars. Many users still
just deploy using jars and it would be useful to them. Once we have a
docker environment we can add a docker image too to the nightly builds so
that users can choose their preferred way.

On Mon, May 22, 2023 at 11:07 PM Sungwoo Park <gl...@gmail.com> wrote:

> I think such nightly builds will be useful for testing and debugging in the
> future.
>
> I also wonder if we can somehow create builds even from previous commits
> (e.g., for the past few years). Such builds from previous commits don't
> have to be daily builds, and I think weekly builds (or even monthly builds)
> would also be very useful.
>
> The reason I wish such builds were available is to facilitate debugging and
> testing. When tested against the TPC-DS benchmark, the current master
> branch has several correctness problems that were introduced after the
> release of Hive 3.1.2. We have reported all problems known to us in [1] and
> also submitted several patches. If such nightly builds had been available,
> we would have saved quite a bit of time for implementing the patches by
> quickly finding offending commits that introduced new correctness bugs.
>
> In addition, you can find quite a few commits in the master branch that
> report bugs which are not reproduced in Hive 3.1.2. Examples: HIVE-19990,
> HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
> HIVE-22227, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
> HIVE-25170, HIVE-25864, HIVE-26671.
> (There may be some errors in this list because we compared against Hive
> 3.1.2 with many patches backported.) Such nightly builds can be useful for
> finding root causes of such bugs.
>
> Ideally I wish there was an automated procedure to create nightly builds,
> run TPC-DS benchmark, and report correctness/performance results, although
> this would be quite hard to implement. (I remember Spark implemented this
> procedure in the era of Spark 2, but my memory could be wrong.)
>
> [1] https://issues.apache.org/jira/browse/HIVE-26654
>
>
> On Tue, May 23, 2023 at 10:44 AM Ayush Saxena <ay...@gmail.com> wrote:
>
> > Hi Vihang,
> > +1, We were even exploring publishing the docker images of the snapshot
> > version as well per commit or maybe weekly, so just shoot 2 docker
> commands
> > and you get a Hive cluster running with master code.
> >
> > Sai, I think to spin up an env via Docker with all these things should be
> > doable for sure, but would require someone with real good expertise with
> > docker as well as setting up these services with Hive. Obviously, I am
> not
> > that guy :-)
> >
> > @Simhadri has a PR which publishes docker images once a release tag is
> > pushed, you can explore to have similar stuff for the Snapshot version,
> > maybe if that sounds cool
> >
> > -Ayush
> >
> > On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
> > <sa...@cloudera.com.invalid> wrote:
> >
> > > Hi Vihang,
> > >
> > > +1 on the idea.
> > >
> > > This is a great idea to quickly test if a certain feature is working as
> > > expected on a certain branch.
> > > This way we test data loss, correctness, or any other unexpected
> > scenarios
> > > that are Hive specific only. However, I'm wondering if it is possible
> to
> > > deploy/test in a kerberized environment or issues involving
> authorization
> > > services like sentry/ranger.
> > >
> > > Thanks,
> > > Sai.
> > >
> > > On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
> > vihangk1@apache.org>
> > > wrote:
> > >
> > > > Hello Team,
> > > >
> > > > I have observed that it is a common use-case where users would like
> to
> > > test
> > > > out unreleased features/bug fixes either to unblock them or test out
> if
> > > the
> > > > bug fixes really work as intended in their environments. Today in the
> > > case
> > > > of Apache Hive, this is not very user friendly because it requires
> the
> > > end
> > > > user to build the binaries directly from the hive source code.
> > > >
> > > > I found that Apache Spark has a very useful infrastructure [1] which
> > > > deploys nightly snapshots [2] [3] from the branch using github
> actions.
> > > > This is super useful for any user who wants to try out the latest and
> > > > greatest using the nightly builds.
> > > >
> > > > I was wondering if we should also adopt this. We can use github
> actions
> > > to
> > > > upload the snapshot jars to the public repository (e.g github
> packages)
> > > and
> > > > schedule it as a nightly job.
> > > >
> > > > [1] https://issues.apache.org/jira/browse/INFRA-21167
> > > > [2]
> > https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
> > > > [3] https://github.com/apache/spark/pull/30623
> > > >
> > > > I can take a stab at this if the community thinks that this is a nice
> > > thing
> > > > to have.
> > > >
> > > > Thanks,
> > > > Vihang
> > > >
> > >
> >
>

Re: [DISCUSS] Nightly snaphot builds

Posted by Sungwoo Park <gl...@gmail.com>.
I think such nightly builds will be useful for testing and debugging in the
future.

I also wonder if we can somehow create builds even from previous commits
(e.g., for the past few years). Such builds from previous commits don't
have to be daily builds, and I think weekly builds (or even monthly builds)
would also be very useful.

The reason I wish such builds were available is to facilitate debugging and
testing. When tested against the TPC-DS benchmark, the current master
branch has several correctness problems that were introduced after the
release of Hive 3.1.2. We have reported all problems known to us in [1] and
also submitted several patches. If such nightly builds had been available,
we would have saved quite a bit of time for implementing the patches by
quickly finding offending commits that introduced new correctness bugs.

In addition, you can find quite a few commits in the master branch that
report bugs which are not reproduced in Hive 3.1.2. Examples: HIVE-19990,
HIVE-14557, HIVE-21132, HIVE-21188, HIVE-21544, HIVE-22114,
HIVE-22227, HIVE-22236, HIVE-23911, HIVE-24198, HIVE-22777,
HIVE-25170, HIVE-25864, HIVE-26671.
(There may be some errors in this list because we compared against Hive
3.1.2 with many patches backported.) Such nightly builds can be useful for
finding root causes of such bugs.

Ideally I wish there was an automated procedure to create nightly builds,
run TPC-DS benchmark, and report correctness/performance results, although
this would be quite hard to implement. (I remember Spark implemented this
procedure in the era of Spark 2, but my memory could be wrong.)

[1] https://issues.apache.org/jira/browse/HIVE-26654


On Tue, May 23, 2023 at 10:44 AM Ayush Saxena <ay...@gmail.com> wrote:

> Hi Vihang,
> +1, We were even exploring publishing the docker images of the snapshot
> version as well per commit or maybe weekly, so just shoot 2 docker commands
> and you get a Hive cluster running with master code.
>
> Sai, I think to spin up an env via Docker with all these things should be
> doable for sure, but would require someone with real good expertise with
> docker as well as setting up these services with Hive. Obviously, I am not
> that guy :-)
>
> @Simhadri has a PR which publishes docker images once a release tag is
> pushed, you can explore to have similar stuff for the Snapshot version,
> maybe if that sounds cool
>
> -Ayush
>
> On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
> <sa...@cloudera.com.invalid> wrote:
>
> > Hi Vihang,
> >
> > +1 on the idea.
> >
> > This is a great idea to quickly test if a certain feature is working as
> > expected on a certain branch.
> > This way we test data loss, correctness, or any other unexpected
> scenarios
> > that are Hive specific only. However, I'm wondering if it is possible to
> > deploy/test in a kerberized environment or issues involving authorization
> > services like sentry/ranger.
> >
> > Thanks,
> > Sai.
> >
> > On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <
> vihangk1@apache.org>
> > wrote:
> >
> > > Hello Team,
> > >
> > > I have observed that it is a common use-case where users would like to
> > test
> > > out unreleased features/bug fixes either to unblock them or test out if
> > the
> > > bug fixes really work as intended in their environments. Today in the
> > case
> > > of Apache Hive, this is not very user friendly because it requires the
> > end
> > > user to build the binaries directly from the hive source code.
> > >
> > > I found that Apache Spark has a very useful infrastructure [1] which
> > > deploys nightly snapshots [2] [3] from the branch using github actions.
> > > This is super useful for any user who wants to try out the latest and
> > > greatest using the nightly builds.
> > >
> > > I was wondering if we should also adopt this. We can use github actions
> > to
> > > upload the snapshot jars to the public repository (e.g github packages)
> > and
> > > schedule it as a nightly job.
> > >
> > > [1] https://issues.apache.org/jira/browse/INFRA-21167
> > > [2]
> https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
> > > [3] https://github.com/apache/spark/pull/30623
> > >
> > > I can take a stab at this if the community thinks that this is a nice
> > thing
> > > to have.
> > >
> > > Thanks,
> > > Vihang
> > >
> >
>

Re: [DISCUSS] Nightly snaphot builds

Posted by Ayush Saxena <ay...@gmail.com>.
Hi Vihang,
+1, We were even exploring publishing the docker images of the snapshot
version as well per commit or maybe weekly, so just shoot 2 docker commands
and you get a Hive cluster running with master code.

Sai, I think to spin up an env via Docker with all these things should be
doable for sure, but would require someone with real good expertise with
docker as well as setting up these services with Hive. Obviously, I am not
that guy :-)

@Simhadri has a PR which publishes docker images once a release tag is
pushed, you can explore to have similar stuff for the Snapshot version,
maybe if that sounds cool

-Ayush

On Tue, 23 May 2023 at 04:26, Sai Hemanth Gantasala
<sa...@cloudera.com.invalid> wrote:

> Hi Vihang,
>
> +1 on the idea.
>
> This is a great idea to quickly test if a certain feature is working as
> expected on a certain branch.
> This way we test data loss, correctness, or any other unexpected scenarios
> that are Hive specific only. However, I'm wondering if it is possible to
> deploy/test in a kerberized environment or issues involving authorization
> services like sentry/ranger.
>
> Thanks,
> Sai.
>
> On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <vi...@apache.org>
> wrote:
>
> > Hello Team,
> >
> > I have observed that it is a common use-case where users would like to
> test
> > out unreleased features/bug fixes either to unblock them or test out if
> the
> > bug fixes really work as intended in their environments. Today in the
> case
> > of Apache Hive, this is not very user friendly because it requires the
> end
> > user to build the binaries directly from the hive source code.
> >
> > I found that Apache Spark has a very useful infrastructure [1] which
> > deploys nightly snapshots [2] [3] from the branch using github actions.
> > This is super useful for any user who wants to try out the latest and
> > greatest using the nightly builds.
> >
> > I was wondering if we should also adopt this. We can use github actions
> to
> > upload the snapshot jars to the public repository (e.g github packages)
> and
> > schedule it as a nightly job.
> >
> > [1] https://issues.apache.org/jira/browse/INFRA-21167
> > [2] https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
> > [3] https://github.com/apache/spark/pull/30623
> >
> > I can take a stab at this if the community thinks that this is a nice
> thing
> > to have.
> >
> > Thanks,
> > Vihang
> >
>

Re: [DISCUSS] Nightly snaphot builds

Posted by Sai Hemanth Gantasala <sa...@cloudera.com.INVALID>.
Hi Vihang,

+1 on the idea.

This is a great idea to quickly test if a certain feature is working as
expected on a certain branch.
This way we test data loss, correctness, or any other unexpected scenarios
that are Hive specific only. However, I'm wondering if it is possible to
deploy/test in a kerberized environment or issues involving authorization
services like sentry/ranger.

Thanks,
Sai.

On Mon, May 22, 2023 at 11:15 AM vihang karajgaonkar <vi...@apache.org>
wrote:

> Hello Team,
>
> I have observed that it is a common use-case where users would like to test
> out unreleased features/bug fixes either to unblock them or test out if the
> bug fixes really work as intended in their environments. Today in the case
> of Apache Hive, this is not very user friendly because it requires the end
> user to build the binaries directly from the hive source code.
>
> I found that Apache Spark has a very useful infrastructure [1] which
> deploys nightly snapshots [2] [3] from the branch using github actions.
> This is super useful for any user who wants to try out the latest and
> greatest using the nightly builds.
>
> I was wondering if we should also adopt this. We can use github actions to
> upload the snapshot jars to the public repository (e.g github packages) and
> schedule it as a nightly job.
>
> [1] https://issues.apache.org/jira/browse/INFRA-21167
> [2] https://github.com/apache/spark/pkgs/container/apache-spark-ci-image
> [3] https://github.com/apache/spark/pull/30623
>
> I can take a stab at this if the community thinks that this is a nice thing
> to have.
>
> Thanks,
> Vihang
>