You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Andrew Lamb <al...@influxdata.com> on 2021/07/23 19:57:38 UTC

[Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

Does anyone want to make a DataFusion / Ballista official release (and then
subsequent release to crates.io)?  There is now a ticket [1] to track this
work. I think it would be great to do if someone has time. There are all
sorts of great features that have gone in since 4.0.0

I don't have much time to devote to the release management of DataFusion /
Ballista in the near term (as my project uses DataFusion master and my
release management budget is already spent on managing arrow-rs releases).

Andrew

[1] https://github.com/apache/arrow-datafusion/issues/771

Re: [Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

Posted by Andrew Lamb <al...@influxdata.com>.
I also think it sounds like a good process to get all the various packages
released in a timely manner. Thank you for taking point on this issue

Andrew

On Sun, Aug 1, 2021 at 11:24 PM Andy Grove <an...@gmail.com> wrote:

> Thanks QP. This seems reasonable to me.
>
> On Sun, Aug 1, 2021, 3:24 PM QP Hou <ho...@gmail.com> wrote:
>
> > Summarizing the discussed proposal in our Github issue [1] for broader
> > discussion and review on the dev list.
> >
> > The current arrow-datafusion repo contains the following high level
> > subprojects: datafusion, datafusion python binding and ballista.
> >
> > In order to be able to release ballista and datafusion python binding
> > with semantic versioning, I propose we decouple subproject versions
> > from each other. As a result, we will be able to release a breaking
> > change in datafusion without forcing a major version bump in ballista
> > or python binding if that breaking change is not visible to their
> > consumers.
> >
> > To reduce release overhead, we will still vote on the whole
> > arrow-datafusion repo on every release. From the same release tarball,
> > we can then release these sub-projects to their language specific
> > registries (crates.io and pypi) with their own versions.
> >
> > Take the upcoming datafusion 5.0.0 release as an example. Within the
> > same source release, we also have the code for ballista-0.5.0 and
> > datafusion-python-0.3.0. We only need to vote on a signed
> > apache-arrow-datafusion-5.0.0.tar.gz tarball.
> >
> > Consequence of this process is every time we need to release a new
> > version of the python binding or ballista, we need to trigger a new
> > datafusion release as well. However, datafusion release won't require
> > a new release from the other two subprojects. For example, datafusion
> > 5.1.0 release can just include a datafusion python release 0.4.0
> > without a ballista release. In that case, we will just skip crates.io
> > publish for ballista.
> >
> > Here is what the release process will look like:
> >
> > * Send a PR with the following changes to prepare the source tree for
> > a new release:
> >     - Update versions in Cargo.toml files
> >     - Run automation script to generate
> > {datafusion,python,ballista}/CHANGELOG.md
> > * After PR gets merged, push git tag x.y.z to Github
> > * Run dev/release/create-tarball.sh to create and upload a signed
> > tarball for voting in the dev list
> > * After vote passed, run ./dev/release/release-tarball.sh to move
> > approved tarball to the release location in SVN
> > * Unpack released tarball and release subproject to language specific
> > registries:
> >     - run `cargo publish` in datafusion to release datafusion to
> crates.io
> >     - if there is a new ballista release
> >         - run `cargo publish` in
> > ballista/rust/{client,core,executor,scheduler} folders to release
> > ballista to crates.io
> >         - push `ballista-x.y.z` tag to Github
> >     - if there is a new datafusion python release
> >         - run `maturin publish` in python folder to release datafusion
> > python binding to pypi
> >         - release python documentation
> >         - push `python-x.y.z` tag to Github
> >
> > I would like to get some feedback on this proposal since it is a
> > little bit different from other Arrow projects. But I do think this
> > will provide a bitter dependency pinning experience and changelog
> > tracking for those sub-projects' downstream consumers.
> >
> > [1]: https://github.com/apache/arrow-datafusion/issues/771
> >
> >
> > On Tue, Jul 27, 2021 at 4:18 PM Andrew Lamb <al...@influxdata.com>
> wrote:
> > >
> > > Thanks to you both -- this sounds great.
> > >
> > > On Tue, Jul 27, 2021 at 8:37 AM Jiayu Liu <ji...@gmail.com> wrote:
> > >
> > > > Not sure it's necessarily bundled together but I believe a Python,
> > > > documentation, etc. release can also be helpful. I can volunteer to
> > help if
> > > > somehow these works can be parallelized.
> > > >
> > > > On Tue, Jul 27, 2021 at 3:29 PM QP Hou <ho...@gmail.com> wrote:
> > > >
> > > > > Following up on this, since delta-rs could really benefit from this
> > > > > release, I have started some initial work with
> > > > > https://github.com/apache/arrow-datafusion/pull/780 to move things
> > > > > forward. Others are welcome to join the party.
> > > > >
> > > > > On Fri, Jul 23, 2021 at 12:58 PM Andrew Lamb <alamb@influxdata.com
> >
> > > > wrote:
> > > > > >
> > > > > > Does anyone want to make a DataFusion / Ballista official release
> > (and
> > > > > then
> > > > > > subsequent release to crates.io)?  There is now a ticket [1] to
> > track
> > > > > this
> > > > > > work. I think it would be great to do if someone has time. There
> > are
> > > > all
> > > > > > sorts of great features that have gone in since 4.0.0
> > > > > >
> > > > > > I don't have much time to devote to the release management of
> > > > DataFusion
> > > > > /
> > > > > > Ballista in the near term (as my project uses DataFusion master
> > and my
> > > > > > release management budget is already spent on managing arrow-rs
> > > > > releases).
> > > > > >
> > > > > > Andrew
> > > > > >
> > > > > > [1] https://github.com/apache/arrow-datafusion/issues/771
> > > > >
> > > >
> >
>

Re: [Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

Posted by Andy Grove <an...@gmail.com>.
Thanks QP. This seems reasonable to me.

On Sun, Aug 1, 2021, 3:24 PM QP Hou <ho...@gmail.com> wrote:

> Summarizing the discussed proposal in our Github issue [1] for broader
> discussion and review on the dev list.
>
> The current arrow-datafusion repo contains the following high level
> subprojects: datafusion, datafusion python binding and ballista.
>
> In order to be able to release ballista and datafusion python binding
> with semantic versioning, I propose we decouple subproject versions
> from each other. As a result, we will be able to release a breaking
> change in datafusion without forcing a major version bump in ballista
> or python binding if that breaking change is not visible to their
> consumers.
>
> To reduce release overhead, we will still vote on the whole
> arrow-datafusion repo on every release. From the same release tarball,
> we can then release these sub-projects to their language specific
> registries (crates.io and pypi) with their own versions.
>
> Take the upcoming datafusion 5.0.0 release as an example. Within the
> same source release, we also have the code for ballista-0.5.0 and
> datafusion-python-0.3.0. We only need to vote on a signed
> apache-arrow-datafusion-5.0.0.tar.gz tarball.
>
> Consequence of this process is every time we need to release a new
> version of the python binding or ballista, we need to trigger a new
> datafusion release as well. However, datafusion release won't require
> a new release from the other two subprojects. For example, datafusion
> 5.1.0 release can just include a datafusion python release 0.4.0
> without a ballista release. In that case, we will just skip crates.io
> publish for ballista.
>
> Here is what the release process will look like:
>
> * Send a PR with the following changes to prepare the source tree for
> a new release:
>     - Update versions in Cargo.toml files
>     - Run automation script to generate
> {datafusion,python,ballista}/CHANGELOG.md
> * After PR gets merged, push git tag x.y.z to Github
> * Run dev/release/create-tarball.sh to create and upload a signed
> tarball for voting in the dev list
> * After vote passed, run ./dev/release/release-tarball.sh to move
> approved tarball to the release location in SVN
> * Unpack released tarball and release subproject to language specific
> registries:
>     - run `cargo publish` in datafusion to release datafusion to crates.io
>     - if there is a new ballista release
>         - run `cargo publish` in
> ballista/rust/{client,core,executor,scheduler} folders to release
> ballista to crates.io
>         - push `ballista-x.y.z` tag to Github
>     - if there is a new datafusion python release
>         - run `maturin publish` in python folder to release datafusion
> python binding to pypi
>         - release python documentation
>         - push `python-x.y.z` tag to Github
>
> I would like to get some feedback on this proposal since it is a
> little bit different from other Arrow projects. But I do think this
> will provide a bitter dependency pinning experience and changelog
> tracking for those sub-projects' downstream consumers.
>
> [1]: https://github.com/apache/arrow-datafusion/issues/771
>
>
> On Tue, Jul 27, 2021 at 4:18 PM Andrew Lamb <al...@influxdata.com> wrote:
> >
> > Thanks to you both -- this sounds great.
> >
> > On Tue, Jul 27, 2021 at 8:37 AM Jiayu Liu <ji...@gmail.com> wrote:
> >
> > > Not sure it's necessarily bundled together but I believe a Python,
> > > documentation, etc. release can also be helpful. I can volunteer to
> help if
> > > somehow these works can be parallelized.
> > >
> > > On Tue, Jul 27, 2021 at 3:29 PM QP Hou <ho...@gmail.com> wrote:
> > >
> > > > Following up on this, since delta-rs could really benefit from this
> > > > release, I have started some initial work with
> > > > https://github.com/apache/arrow-datafusion/pull/780 to move things
> > > > forward. Others are welcome to join the party.
> > > >
> > > > On Fri, Jul 23, 2021 at 12:58 PM Andrew Lamb <al...@influxdata.com>
> > > wrote:
> > > > >
> > > > > Does anyone want to make a DataFusion / Ballista official release
> (and
> > > > then
> > > > > subsequent release to crates.io)?  There is now a ticket [1] to
> track
> > > > this
> > > > > work. I think it would be great to do if someone has time. There
> are
> > > all
> > > > > sorts of great features that have gone in since 4.0.0
> > > > >
> > > > > I don't have much time to devote to the release management of
> > > DataFusion
> > > > /
> > > > > Ballista in the near term (as my project uses DataFusion master
> and my
> > > > > release management budget is already spent on managing arrow-rs
> > > > releases).
> > > > >
> > > > > Andrew
> > > > >
> > > > > [1] https://github.com/apache/arrow-datafusion/issues/771
> > > >
> > >
>

Re: [Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

Posted by QP Hou <ho...@gmail.com>.
Summarizing the discussed proposal in our Github issue [1] for broader
discussion and review on the dev list.

The current arrow-datafusion repo contains the following high level
subprojects: datafusion, datafusion python binding and ballista.

In order to be able to release ballista and datafusion python binding
with semantic versioning, I propose we decouple subproject versions
from each other. As a result, we will be able to release a breaking
change in datafusion without forcing a major version bump in ballista
or python binding if that breaking change is not visible to their
consumers.

To reduce release overhead, we will still vote on the whole
arrow-datafusion repo on every release. From the same release tarball,
we can then release these sub-projects to their language specific
registries (crates.io and pypi) with their own versions.

Take the upcoming datafusion 5.0.0 release as an example. Within the
same source release, we also have the code for ballista-0.5.0 and
datafusion-python-0.3.0. We only need to vote on a signed
apache-arrow-datafusion-5.0.0.tar.gz tarball.

Consequence of this process is every time we need to release a new
version of the python binding or ballista, we need to trigger a new
datafusion release as well. However, datafusion release won't require
a new release from the other two subprojects. For example, datafusion
5.1.0 release can just include a datafusion python release 0.4.0
without a ballista release. In that case, we will just skip crates.io
publish for ballista.

Here is what the release process will look like:

* Send a PR with the following changes to prepare the source tree for
a new release:
    - Update versions in Cargo.toml files
    - Run automation script to generate
{datafusion,python,ballista}/CHANGELOG.md
* After PR gets merged, push git tag x.y.z to Github
* Run dev/release/create-tarball.sh to create and upload a signed
tarball for voting in the dev list
* After vote passed, run ./dev/release/release-tarball.sh to move
approved tarball to the release location in SVN
* Unpack released tarball and release subproject to language specific
registries:
    - run `cargo publish` in datafusion to release datafusion to crates.io
    - if there is a new ballista release
        - run `cargo publish` in
ballista/rust/{client,core,executor,scheduler} folders to release
ballista to crates.io
        - push `ballista-x.y.z` tag to Github
    - if there is a new datafusion python release
        - run `maturin publish` in python folder to release datafusion
python binding to pypi
        - release python documentation
        - push `python-x.y.z` tag to Github

I would like to get some feedback on this proposal since it is a
little bit different from other Arrow projects. But I do think this
will provide a bitter dependency pinning experience and changelog
tracking for those sub-projects' downstream consumers.

[1]: https://github.com/apache/arrow-datafusion/issues/771


On Tue, Jul 27, 2021 at 4:18 PM Andrew Lamb <al...@influxdata.com> wrote:
>
> Thanks to you both -- this sounds great.
>
> On Tue, Jul 27, 2021 at 8:37 AM Jiayu Liu <ji...@gmail.com> wrote:
>
> > Not sure it's necessarily bundled together but I believe a Python,
> > documentation, etc. release can also be helpful. I can volunteer to help if
> > somehow these works can be parallelized.
> >
> > On Tue, Jul 27, 2021 at 3:29 PM QP Hou <ho...@gmail.com> wrote:
> >
> > > Following up on this, since delta-rs could really benefit from this
> > > release, I have started some initial work with
> > > https://github.com/apache/arrow-datafusion/pull/780 to move things
> > > forward. Others are welcome to join the party.
> > >
> > > On Fri, Jul 23, 2021 at 12:58 PM Andrew Lamb <al...@influxdata.com>
> > wrote:
> > > >
> > > > Does anyone want to make a DataFusion / Ballista official release (and
> > > then
> > > > subsequent release to crates.io)?  There is now a ticket [1] to track
> > > this
> > > > work. I think it would be great to do if someone has time. There are
> > all
> > > > sorts of great features that have gone in since 4.0.0
> > > >
> > > > I don't have much time to devote to the release management of
> > DataFusion
> > > /
> > > > Ballista in the near term (as my project uses DataFusion master and my
> > > > release management budget is already spent on managing arrow-rs
> > > releases).
> > > >
> > > > Andrew
> > > >
> > > > [1] https://github.com/apache/arrow-datafusion/issues/771
> > >
> >

Re: [Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

Posted by Andrew Lamb <al...@influxdata.com>.
Thanks to you both -- this sounds great.

On Tue, Jul 27, 2021 at 8:37 AM Jiayu Liu <ji...@gmail.com> wrote:

> Not sure it's necessarily bundled together but I believe a Python,
> documentation, etc. release can also be helpful. I can volunteer to help if
> somehow these works can be parallelized.
>
> On Tue, Jul 27, 2021 at 3:29 PM QP Hou <ho...@gmail.com> wrote:
>
> > Following up on this, since delta-rs could really benefit from this
> > release, I have started some initial work with
> > https://github.com/apache/arrow-datafusion/pull/780 to move things
> > forward. Others are welcome to join the party.
> >
> > On Fri, Jul 23, 2021 at 12:58 PM Andrew Lamb <al...@influxdata.com>
> wrote:
> > >
> > > Does anyone want to make a DataFusion / Ballista official release (and
> > then
> > > subsequent release to crates.io)?  There is now a ticket [1] to track
> > this
> > > work. I think it would be great to do if someone has time. There are
> all
> > > sorts of great features that have gone in since 4.0.0
> > >
> > > I don't have much time to devote to the release management of
> DataFusion
> > /
> > > Ballista in the near term (as my project uses DataFusion master and my
> > > release management budget is already spent on managing arrow-rs
> > releases).
> > >
> > > Andrew
> > >
> > > [1] https://github.com/apache/arrow-datafusion/issues/771
> >
>

Re: [Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

Posted by Jiayu Liu <ji...@gmail.com>.
Not sure it's necessarily bundled together but I believe a Python,
documentation, etc. release can also be helpful. I can volunteer to help if
somehow these works can be parallelized.

On Tue, Jul 27, 2021 at 3:29 PM QP Hou <ho...@gmail.com> wrote:

> Following up on this, since delta-rs could really benefit from this
> release, I have started some initial work with
> https://github.com/apache/arrow-datafusion/pull/780 to move things
> forward. Others are welcome to join the party.
>
> On Fri, Jul 23, 2021 at 12:58 PM Andrew Lamb <al...@influxdata.com> wrote:
> >
> > Does anyone want to make a DataFusion / Ballista official release (and
> then
> > subsequent release to crates.io)?  There is now a ticket [1] to track
> this
> > work. I think it would be great to do if someone has time. There are all
> > sorts of great features that have gone in since 4.0.0
> >
> > I don't have much time to devote to the release management of DataFusion
> /
> > Ballista in the near term (as my project uses DataFusion master and my
> > release management budget is already spent on managing arrow-rs
> releases).
> >
> > Andrew
> >
> > [1] https://github.com/apache/arrow-datafusion/issues/771
>

Re: [Rust][DataFusion] [DISCUSS] Next DataFusion / Ballista official release

Posted by QP Hou <ho...@gmail.com>.
Following up on this, since delta-rs could really benefit from this
release, I have started some initial work with
https://github.com/apache/arrow-datafusion/pull/780 to move things
forward. Others are welcome to join the party.

On Fri, Jul 23, 2021 at 12:58 PM Andrew Lamb <al...@influxdata.com> wrote:
>
> Does anyone want to make a DataFusion / Ballista official release (and then
> subsequent release to crates.io)?  There is now a ticket [1] to track this
> work. I think it would be great to do if someone has time. There are all
> sorts of great features that have gone in since 4.0.0
>
> I don't have much time to devote to the release management of DataFusion /
> Ballista in the near term (as my project uses DataFusion master and my
> release management budget is already spent on managing arrow-rs releases).
>
> Andrew
>
> [1] https://github.com/apache/arrow-datafusion/issues/771