You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by al...@apache.org on 2021/11/15 12:02:54 UTC

[arrow-datafusion] branch master updated: python: update release instructions & automation (#1295)

This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow-datafusion.git


The following commit(s) were added to refs/heads/master by this push:
     new 05fdb7a  python: update release instructions & automation (#1295)
05fdb7a is described below

commit 05fdb7aefb1e3f9c1b170c70d998337c173eb12c
Author: QP Hou <qp...@scribd.com>
AuthorDate: Mon Nov 15 04:02:50 2021 -0800

    python: update release instructions & automation (#1295)
    
    * python: update release instructions & automation
    
    * add PMC member note
---
 .gitignore                              |   3 +
 dev/release/README.md                   |  62 +++++++++++++----
 dev/release/create-tarball.sh           |  18 ++++-
 dev/release/download-python-wheels.py   | 119 ++++++++++++++++++++++++++++++++
 dev/release/verify-release-candidate.sh |  17 +++--
 5 files changed, 197 insertions(+), 22 deletions(-)

diff --git a/.gitignore b/.gitignore
index 31bdf49..80a9cb6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -91,3 +91,6 @@ rusty-tags.vi
 
 .vscode
 venv/*
+
+# apache release artifacts
+dev/dist
diff --git a/dev/release/README.md b/dev/release/README.md
index 775678a..2127dc2 100644
--- a/dev/release/README.md
+++ b/dev/release/README.md
@@ -50,7 +50,7 @@ release backport branch.
 As part of the Apache governance model, official releases consist of signed
 source tarballs approved by the PMC.
 
-We then use the code in the approved source tarball to release to crates.io and
+We then use the code in the approved artifacts to release to crates.io and
 PyPI.
 
 ### Change Log
@@ -126,9 +126,9 @@ could change the change log content landed in the `master` branch before you
 could merge the PR, you need to rerun the changelog update script to regenerate
 the changelog and update the PR accordingly.
 
-## Prepare release candidate tarball
+## Prepare release candidate artifacts
 
-After the PR gets merged, you are ready to create a releaes tarball from the
+After the PR gets merged, you are ready to create releaes artifacts based off the
 merged commit.
 
 (Note you need to be a committer to run these scripts as they upload to the apache svn distribution servers)
@@ -139,7 +139,8 @@ Pick numbers in sequential order, with `0` for `rc0`, `1` for `rc1`, etc.
 
 ### Create git tag for the release:
 
-While the official release artifact is a signed tarball, we also tag the commit it was created for convenience and code archaeology.
+While the official release artifacts are signed tarballs and zip files, we also
+tag the commit it was created for convenience and code archaeology.
 
 Using a string such as `5.1.0` as the `<version>`, create and push the tag thusly:
 
@@ -150,24 +151,27 @@ git tag <version>-<rc> apache/master
 git push apache <version>
 ```
 
-### Create, sign, and upload tarball
+This should trigger the `Python Release Build` Github Action workflow for the
+pushed tag. You can monitor the pipline run status at https://github.com/apache/arrow-datafusion/actions/workflows/python_build.yml.
+
+### Create, sign, and upload artifacts
 
 Run `create-tarball.sh` with the `<version>` tag and `<rc>` and you found in previous steps:
 
 ```shell
-./dev/release/create-tarball.sh 5.1.0 0
+GH_TOKEN=<TOKEN> ./dev/release/create-tarball.sh 5.1.0 0
 ```
 
 The `create-tarball.sh` script
 
-1. creates and uploads a release candidate tarball to the [arrow
+1. creates and uploads all release candidate artifacts to the [arrow
    dev](https://dist.apache.org/repos/dist/dev/arrow) location on the
    apache distribution svn server
 
 2. provide you an email template to
    send to dev@arrow.apache.org for release voting.
 
-### Vote on Release Candidate tarball
+### Vote on Release Candidate artifacts
 
 Send the email output from the script to dev@arrow.apache.org. The email should look like
 
@@ -181,7 +185,7 @@ I would like to propose a release of Apache Arrow Datafusion Implementation,
 version 5.1.0.
 
 This release candidate is based on commit: a5dd428f57e62db20a945e8b1895de91405958c4 [1]
-The proposed release tarball and signatures are hosted at [2].
+The proposed release artifacts and signatures are hosted at [2].
 The changelog is located at [3].
 
 Please download, verify checksums and signatures, run the unit tests,
@@ -215,9 +219,11 @@ changes into master if there is any and try again with the next RC number.
 
 ## Finalize the release
 
+NOTE: steps in this section can only be done by PMC members.
+
 ### After the release is approved
 
-Move tarball to the release location in SVN, e.g.
+Move artifacts to the release location in SVN, e.g.
 https://dist.apache.org/repos/dist/release/arrow/arrow-datafusion-5.1.0/, using
 the `release-tarball.sh` script:
 
@@ -232,7 +238,7 @@ Congratulations! The release is now offical!
 Tag the same release candidate commit with the final release tag
 
 ```
-git co apache/5.1.0-RC0
+git co apache/5.1.0-rc0
 git tag 5.1.0
 git push 5.1.0
 ```
@@ -292,7 +298,20 @@ If there is a ballista release, run
 
 ### Publish on PyPI
 
-TODO
+Only approved releases of the source tarball and wheels should be published to
+PyPI, in order to conform to Apache Software Foundation governance standards.
+
+First, download all official python release artifacts:
+
+```shell
+svn co https://dist.apache.org/repos/dist/release/arrow/apache-arrow-datafusion-5.1.0-rc0/python ./python-artifacts
+```
+
+Use [twine](https://pypi.org/project/twine/) to perform the upload.
+
+```shell
+twine upload ./python-artifactl/*.{tar.gz,whl}
+```
 
 ### Call the vote
 
@@ -300,4 +319,21 @@ Call the vote on the Arrow dev list by replying to the RC voting thread. The
 reply should have a new subject constructed by adding `[RESULT]` prefix to the
 old subject line.
 
-TODO: add example mail
+Sample announcement template:
+
+```
+The vote has passed with <NUMBER> +1 votes. Thank you to all who helped
+with the release verification.
+```
+
+You can include mention crates.io and PyPI version URLs in the email if applicable.
+
+```
+We have published new versions of datafusion and ballista to crates.io:
+
+https://crates.io/crates/datafusion/5.0.0
+https://crates.io/crates/ballista/0.5.0
+https://crates.io/crates/ballista-core/0.5.0
+https://crates.io/crates/ballista-executor/0.5.0
+https://crates.io/crates/ballista-scheduler/0.5.0
+```
diff --git a/dev/release/create-tarball.sh b/dev/release/create-tarball.sh
index 94318d0..59214a5 100755
--- a/dev/release/create-tarball.sh
+++ b/dev/release/create-tarball.sh
@@ -36,6 +36,8 @@
 # 2. Logged into the apache svn server with the appropriate
 # credentials
 #
+# 3. Install the requests python package
+#
 #
 # Based in part on 02-source.sh from apache/arrow
 #
@@ -48,7 +50,12 @@ SOURCE_TOP_DIR="$(cd "${SOURCE_DIR}/../../" && pwd)"
 if [ "$#" -ne 2 ]; then
     echo "Usage: $0 <version> <rc>"
     echo "ex. $0 4.1.0 2"
-  exit
+    exit
+fi
+
+if [[ -z "${GH_TOKEN}" ]]; then
+    echo "Please set personal github token through GH_TOKEN environment variable"
+    exit
 fi
 
 version=$1
@@ -118,8 +125,15 @@ gpg --armor --output ${tarball}.asc --detach-sig ${tarball}
 (cd ${distdir} && shasum -a 256 ${tarname}) > ${tarball}.sha256
 (cd ${distdir} && shasum -a 512 ${tarname}) > ${tarball}.sha512
 
+# download python binary releases from Github Action
+python_distdir=${distdir}/python
+echo "Preparing python release artifacts"
+test -d ${python_distdir} || mkdir -p ${python_distdir}
+pushd "${python_distdir}"
+    python ${SOURCE_DIR}/download-python-wheels.py "${tag}"
+popd
+
 echo "Uploading to apache dist/dev to ${url}"
 svn co --depth=empty https://dist.apache.org/repos/dist/dev/arrow ${SOURCE_TOP_DIR}/dev/dist
 svn add ${distdir}
 svn ci -m "Apache Arrow Datafusion ${version} ${rc}" ${distdir}
-
diff --git a/dev/release/download-python-wheels.py b/dev/release/download-python-wheels.py
new file mode 100644
index 0000000..043cb92
--- /dev/null
+++ b/dev/release/download-python-wheels.py
@@ -0,0 +1,119 @@
+#!/usr/bin/env python
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Script that download python release artifacts from Github
+#
+# dependencies:
+# pip install requests
+
+
+import sys
+import os
+import argparse
+import requests
+import zipfile
+import subprocess
+import hashlib
+import io
+
+
+def main():
+    parser = argparse.ArgumentParser(
+        description='Download python binary wheels from release candidate workflow runs.')
+    parser.add_argument('tag', type=str, help='datafusion RC release tag')
+    args = parser.parse_args()
+
+    tag = args.tag
+    ghp_token = os.environ.get("GH_TOKEN")
+    if not ghp_token:
+        print(
+            "ERROR: Personal Github token is required to download workflow artifacts. "
+            "Please specify a token through GH_TOKEN environment variable.")
+        sys.exit(1)
+
+    print(f"Downloading latest python wheels for RC tag {tag}...")
+
+    headers = {
+        "Accept": "application/vnd.github.v3+json",
+        "Authorization": f"token {ghp_token}",
+    }
+    url = f"https://api.github.com/repos/apache/arrow-datafusion/actions/runs?branch={tag}"
+    resp = requests.get(url, headers=headers)
+    resp.raise_for_status()
+
+    artifacts_url = None
+    for run in resp.json()["workflow_runs"]:
+        if run["name"] != "Python Release Build":
+            continue
+        artifacts_url = run["artifacts_url"]
+
+    if artifacts_url is None:
+        print("ERROR: Could not find python wheel binaries from Github Action run")
+        sys.exit(1)
+    print(f"Found artifacts url: {artifacts_url}")
+
+    download_url = None
+    artifacts = requests.get(artifacts_url, headers=headers).json()["artifacts"]
+    for artifact in artifacts:
+        if artifact["name"] != "dist":
+            continue
+        download_url = artifact["archive_download_url"]
+
+    if download_url is None:
+        print(f"ERROR: Could not resolve python wheel download URL from list of artifacts: {artifacts}")
+        sys.exit(1)
+    print(f"Extracting archive from: {download_url}...")
+
+    resp = requests.get(download_url, headers=headers, stream=True)
+    resp.raise_for_status()
+    zf = zipfile.ZipFile(io.BytesIO(resp.content))
+    zf.extractall("./")
+
+    for entry in os.listdir("./"):
+        if entry.endswith(".whl") or entry.endswith(".tar.gz"):
+            print(f"Sign and checksum artifact: {entry}")
+            subprocess.check_output([
+                "gpg", "--armor",
+                "--output", entry+".asc",
+                "--detach-sig", entry,
+            ])
+
+            sha256 = hashlib.sha256()
+            sha512 = hashlib.sha512()
+            with open(entry, "rb") as fd:
+                while True:
+                    data = fd.read(65536)
+                    if not data:
+                        break
+                    sha256.update(data)
+                    sha512.update(data)
+            with open(entry+".sha256", "w") as fd:
+                fd.write(sha256.hexdigest())
+                fd.write("  ")
+                fd.write(entry)
+                fd.write("\n")
+            with open(entry+".sha512", "w") as fd:
+                fd.write(sha512.hexdigest())
+                fd.write("  ")
+                fd.write(entry)
+                fd.write("\n")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/dev/release/verify-release-candidate.sh b/dev/release/verify-release-candidate.sh
index a37b6ff..5ac7b23 100755
--- a/dev/release/verify-release-candidate.sh
+++ b/dev/release/verify-release-candidate.sh
@@ -67,9 +67,7 @@ fetch_archive() {
   download_rc_file ${dist_name}.tar.gz.asc
   download_rc_file ${dist_name}.tar.gz.sha256
   download_rc_file ${dist_name}.tar.gz.sha512
-  gpg --verify ${dist_name}.tar.gz.asc ${dist_name}.tar.gz
-  ${sha256_verify} ${dist_name}.tar.gz.sha256
-  ${sha512_verify} ${dist_name}.tar.gz.sha512
+  verify_dir_artifact_signatures
 }
 
 verify_dir_artifact_signatures() {
@@ -82,9 +80,7 @@ verify_dir_artifact_signatures() {
     # basename of the artifact
     pushd $(dirname $artifact)
     base_artifact=$(basename $artifact)
-    if [ -f $base_artifact.sha256 ]; then
-      ${sha256_verify} $base_artifact.sha256 || exit 1
-    fi
+    ${sha256_verify} $base_artifact.sha256 || exit 1
     ${sha512_verify} $base_artifact.sha512 || exit 1
     popd
   done
@@ -150,7 +146,14 @@ import_gpg_keys
 fetch_archive ${dist_name}
 tar xf ${dist_name}.tar.gz
 pushd ${dist_name}
-test_source_distribution
+    test_source_distribution
+popd
+
+echo "Verifying python artifacts..."
+svn co $ARROW_DIST_URL/apache-arrow-datafusion-${VERSION}-rc${RC_NUMBER}/python python-artifacts
+pushd python-artifacts
+    verify_dir_artifact_signatures
+    twine check *.{whl,tar.gz}
 popd
 
 TEST_SUCCESS=yes