You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/19 21:18:56 UTC

[GitHub] [arrow] nealrichardson commented on a diff in pull request #13149: ARROW-16403:[R][CI] Create Crossbow task for R nightly builds

nealrichardson commented on code in PR #13149:
URL: https://github.com/apache/arrow/pull/13149#discussion_r877538871


##########
dev/tasks/macros.jinja:
##########
@@ -221,3 +222,35 @@ on:
         cp ${formula} $(brew --repository homebrew/core)/Formula/
       done
 {% endmacro %}
+
+{%- macro change_r_pkg_version(date = '$(date +%Y%m%d)') -%}
+  - name: Modify version
+    shell: bash
+    run: |
+      cd arrow/r
+      sed -i.bak -E -e \
+        's/(^Version: [0-9]+\.[0-9]+\.[0-9]+).*$/\1.'"{{ date }}"'/' \
+        DESCRIPTION
+      head DESCRIPTION
+      rm -f DESCRIPTION.bak
+      cp ../dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb tools/apache-arrow.rb
+      
+      # Pin the git commit in the formula to match
+      cd tools
+      sed -i.bak -E -e 's/arrow.git"$/arrow.git", :revision => "'"{{ arrow.head }}"'"/' apache-arrow.rb
+      rm -f apache-arrow.rb.bak
+{% endmacro %}
+
+{%- macro test_r_src_pkg() -%}
+  source("https://raw.githubusercontent.com/apache/arrow/master/ci/etc/rprofile")
+  options(arrow.dev_repo = "https://nightly.wujciak.de/r")
+  
+  install.packages(
+    "arrow",
+    repos = c("https://nightly.wujciak.de/r", getOption("repos")),

Review Comment:
   ```suggestion
       repos = c(getOption("arrow.dev_repo"), getOption("repos")),
   ```



##########
dev/tasks/r/github.nightly.yml:
##########
@@ -0,0 +1,381 @@
+# Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   This file should not be called "nightly" because it can also be run on a PR. maybe something about "packages"?



##########
dev/tasks/r/github.nightly.yml:
##########
@@ -0,0 +1,381 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+{% import 'macros.jinja' as macros with context %}
+{% set r_root = "nightly/r" %}
+{% set webdav_domain = "https://nightly.wujciak.de:8080" %}
+{% set repo_domain = "https://nightly.wujciak.de/r" %}
+
+{{ macros.github_header() }}
+
+jobs:
+  source:
+    name: Source Package
+    runs-on: ubuntu-latest
+    outputs:
+      version: {{ '${{ steps.save-version.outputs.version }}' }}
+      date: {{ '${{ steps.save-version.outputs.date }}' }}
+    steps:
+      {{ macros.github_checkout_arrow()|indent }}
+      {{ macros.change_r_pkg_version()|indent }}
+      - name: Save Version
+        id: save-version
+        shell: bash
+        run: | 
+          echo "::set-output name=version::$(grep ^Version arrow/r/DESCRIPTION | sed s/Version:\ //)"
+          echo "::set-output name=date::$(date +%Y%m%d)"
+
+      - uses: r-lib/actions/setup-r@v2
+        with:
+          install-r: false
+
+      - name: Build R source package
+        shell: bash
+        run: |
+          cd arrow/r
+          # Copy in the Arrow C++ source
+          make sync-cpp
+          R CMD build --no-build-vignettes .
+
+      - name: Install davfs2 & Mount Repo
+        run: | 
+          sudo apt update && sudo apt install davfs2
+          mkdir nightly
+          sudo bash -c 'echo "{{ webdav_domain }} {{ '${{ secrets.CROSSBOW_NIGHTLIES_USER }} ${{ secrets.CROSSBOW_NIGHTLIES_TOKEN }}' }}" >> /etc/davfs2/secrets'
+          sudo mount -t davfs {{ webdav_domain }} nightly -o rw
+
+      - name: Upload Source Package
+        run: | 
+          # ensure repo structure is set up, this job is the fastest and should fix any issue before the other jobs try
+          # to push into non existent folder with curl, which would fail silently. 
+          sudo mkdir -p {{ r_root }}/src/contrib 
+        {% for os in ["ubuntu-18.04", 
+                      "centos-7",
+                      "windows"] %}     
+          sudo mkdir -p {{ r_root }}/libarrow/bin/{{ os }}
+        {% endfor %}
+        {% for os in ["windows", "macosx", "macosx/big-sur-arm64"] %}     
+          {% for r_version in ["4.1", "4.2"] %}
+          sudo mkdir -p {{ r_root }}/bin/{{ os }}/contrib/{{ r_version }}
+          {% endfor %}
+        {% endfor %}
+        
+          sudo cp arrow/r/arrow_*.tar.gz {{ r_root }}/src/contrib
+      - name: Update Repo
+        shell: sudo Rscript {0}
+        run: |
+          if(file.exists("{{ r_root }}/src/contrib/PACKAGES")) {
+            tools::update_PACKAGES("{{ r_root }}/src/contrib" , type = "source", latestOnly = FALSE)
+          } else {
+            tools::write_PACKAGES("{{ r_root }}/src/contrib" , type = "source", latestOnly = FALSE)
+          }
+
+      # ensures all changes are written
+      - run: sudo umount nightly 
+      - name: Upload binary artifact (temp)
+        uses: actions/upload-artifact@v3
+        with:
+          name: r-src-pkg
+          path: arrow/r/arrow_*.tar.gz
+
+  linux-cpp:
+    name: C++ Binary {{ '${{ matrix.config.os }}-${{ matrix.config.version }}' }}
+    runs-on: ubuntu-latest
+    needs: source
+    strategy:
+      fail-fast: false
+      matrix:
+        config:
+          - { os: ubuntu, version: "18.04" }
+          - { os: centos, version: "7" }
+    env:
+      UBUNTU: {{ '${{ matrix.config.version }}' }}
+      R: 3.6
+    steps:
+      {{ macros.github_checkout_arrow()|indent }}
+      {{ macros.change_r_pkg_version('${{ needs.source.outputs.date }}')|indent }}
+      {{ macros.github_install_archery()|indent }}
+      - name: Build libarrow
+        shell: bash
+        run: |
+          sudo sysctl -w kernel.core_pattern="core.%e.%p"
+          ulimit -c unlimited
+          archery docker run  {{ '${{ matrix.config.os}}' }}-cpp-static
+      - name: Bundle and upload
+        shell: bash
+        env:
+          USER: {{ '${{ secrets.CROSSBOW_NIGHTLIES_USER }}' }}
+          PW: {{ '${{ secrets.CROSSBOW_NIGHTLIES_TOKEN }}' }}
+          VERSION: {{ '${{ needs.source.outputs.version }}' }}
+        run: |
+          cd arrow/r
+          VERSION=$(grep ^Version DESCRIPTION | sed s/Version:\ //)
+          export PKG_FILE="arrow-${VERSION}.zip"
+
+          cd libarrow/dist
+          # These files were created by the docker user so we have to sudo to get them
+          sudo -E zip -r $PKG_FILE lib/ include/
+          export REPO_PATH={{ 'r/libarrow/bin/${{ matrix.config.os }}-${{ matrix.config.version }}' }}
+
+          curl -s --fail --show-error -u $USER:$PW -T $PKG_FILE  {{ webdav_domain }}/$REPO_PATH/
+      - name: Upload binary artifact (temp)
+        uses: actions/upload-artifact@v3
+        with:
+          name: r-{{ '${{ matrix.config.os}}' }}-libarrow
+          path: arrow/r/libarrow/dist/arrow-*.zip
+
+  windows-cpp:
+    name: C++ Binary Windows RTools (40 only)
+    needs: source
+    runs-on: windows-latest
+    steps:
+      - run: git config --global core.autocrlf false
+      {{ macros.github_checkout_arrow()|indent }}
+      {{ macros.change_r_pkg_version('${{ needs.source.outputs.date }}')|indent }}
+
+      - uses: r-lib/actions/setup-r@v2
+        with:
+          rtools-version: 40
+          r-version: "4.0"
+          Ncpus: 2
+
+      - name: Build Arrow C++ with rtools40
+        shell: bash
+        env:
+          ARROW_HOME: "arrow"
+        run: arrow/ci/scripts/r_windows_build.sh
+
+      - name: Upload Binary
+        shell: bash
+        env:
+          USER: {{ '${{ secrets.CROSSBOW_NIGHTLIES_USER }}' }}
+          PW: {{ '${{ secrets.CROSSBOW_NIGHTLIES_TOKEN }}' }}
+        run: |
+          VERSION=$(grep ^Version arrow/r/DESCRIPTION | sed s/Version:\ //)
+          cd build
+          curl -s --fail --show-error -u $USER:$PW -T arrow-$VERSION.zip  {{ webdav_domain }}/r/libarrow/bin/windows/
+
+      - name: Upload binary artifact (temp)
+        uses: actions/upload-artifact@v3
+        with:
+          name: r-windows-libarrow
+          path: build/arrow-*.zip 
+
+  r-packages:
+    if: true
+    needs: [source, windows-cpp]
+    name: {{ '${{ matrix.platform }} ${{ matrix.r_version.r }}' }}
+    runs-on: {{ '${{ matrix.platform }}' }}
+    strategy:
+      fail-fast: false
+      matrix:
+        platform:
+          - windows-latest
+          # This is newer than what CRAN builds on, but Travis is no longer an option for us, so...
+          - macos-10.15
+          # - devops-managed # No M1 until the runner application runs native
+        r_version:
+          - { rtools: 40, r: "4.1" }
+          - { rtools: 42, r: "4.2" }
+    steps:
+      - uses: r-lib/actions/setup-r@v2
+        with:
+          r-version: {{ '${{ matrix.r_version.r }}' }}
+          rtools-version: {{ '${{ matrix.r_version.rtools }}' }}
+          Ncpus: 2
+      - name: Build Binary
+        shell: Rscript {0}
+        run: |
+          on_windows <- tolower(Sys.info()[["sysname"]]) == "windows"
+
+          # Install dependencies by installing (yesterday's) binary, then removing it
+          install.packages(c("arrow", "cpp11"),
+            type = "binary",
+            # TODO replace with {{ repo_domain }}
+            repos = c("https://arrow-r-nightly.s3.amazonaws.com", "https://cloud.r-project.org")
+          )
+          remove.packages("arrow")
+
+          # Build
+          Sys.setenv(MAKEFLAGS = paste0("-j", parallel::detectCores()))
+          INSTALL_opts <- "--build"
+          if (!on_windows) {
+            # Windows doesn't support the --strip arg
+            INSTALL_opts <- c(INSTALL_opts, "--strip")
+          }
+
+          options(arrow.dev_repo = "https://nightly.wujciak.de/r")
+          install.packages(
+            "arrow",
+            type = "source",
+            repos = "{{ repo_domain }}",
+            INSTALL_opts = INSTALL_opts
+          )
+
+          # Test
+          library(arrow)
+          read_parquet(system.file("v0.7.1.parquet", package = "arrow"))
+      - name: Upload package
+        shell: bash
+        env:
+          USER: {{ '${{ secrets.CROSSBOW_NIGHTLIES_USER }}' }}
+          PW: {{ '${{ secrets.CROSSBOW_NIGHTLIES_TOKEN }}' }}
+          VERSION: {{ '${{ needs.source.outputs.version }}' }}
+        run: |
+          REPO_PATH=r$(Rscript -e "cat(contrib.url('', type = 'binary'))")
+          EXT=$(if [[ {{ '${{ matrix.platform }}' }} == windows* ]]; then echo zip; else echo tgz; fi)
+
+          curl -s --fail --show-error -u $USER:$PW -T arrow_$VERSION.$EXT {{ webdav_domain }}/$REPO_PATH/
+      - name: Upload binary artifact (temp)
+        uses: actions/upload-artifact@v3
+        with:
+          name: r-{{ '${{ matrix.config.os}}' }}-pkg
+          path: arrow_*.zip 
+
+  test-linux-binary:
+    if: true
+    needs: [source, linux-cpp]
+    name: Test binary {{ '${{ matrix.image }}' }}
+    runs-on: ubuntu-latest
+    container: {{ '${{ matrix.image }}' }}
+    strategy:
+      fail-fast: false
+      matrix:
+        image:
+          - "rhub/ubuntu-gcc-release" # ubuntu-20.04 (focal)
+          - "rstudio/r-base:4.1-bionic"
+          - "rstudio/r-base:4.2-centos7"
+          - "rocker/r-ver:3.6.3" # for debian:buster (10)
+          - "rocker/r-ver" # ubuntu-20.04
+          - "rhub/fedora-clang-devel" # tests distro-map.csv, mapped t
+    steps:
+      - name: Install system requirements
+        shell: bash
+        run: |
+          if [ "`which dnf`" ]; then
+            dnf install -y libcurl-devel openssl-devel
+          elif [ "`which yum`" ]; then
+            yum install -y libcurl-devel openssl-devel
+          elif [ "`which zypper`" ]; then
+            zypper install -y libcurl-devel libopenssl-devel
+          else
+            apt-get update
+            apt-get install -y libcurl4-openssl-dev libssl-dev
+          fi
+
+          # Add R-devel to PATH
+          echo "/opt/R-devel/bin" >> $GITHUB_PATH
+
+      - name: Set dev repo
+        shell: bash
+        run: |
+          echo 'options(arrow.dev_repo = "{{ repo_domain }}")' >> ~/.Rprofile
+      - name: Install arrow from our repo
+        env:
+          LIBARROW_BUILD: "FALSE"
+          LIBARROW_BINARY: "TRUE"
+        shell: Rscript {0}
+        run: |
+          {{  macros.test_r_src_pkg()|indent(8) }}
+
+  test-source:
+    #TODO Make sure we don't install arrow from CRAN if the repo fails

Review Comment:
   If you install the dependencies from CRAN (or install.packages("arrow") from CRAN, then remove.packages("arrow") like we do elsewhere), then install.packages("arrow", repos = nightly_repo), there is no CRAN fallback.



##########
dev/tasks/macros.jinja:
##########
@@ -221,3 +222,35 @@ on:
         cp ${formula} $(brew --repository homebrew/core)/Formula/
       done
 {% endmacro %}
+
+{%- macro change_r_pkg_version(date = '$(date +%Y%m%d)') -%}
+  - name: Modify version
+    shell: bash
+    run: |
+      cd arrow/r
+      sed -i.bak -E -e \
+        's/(^Version: [0-9]+\.[0-9]+\.[0-9]+).*$/\1.'"{{ date }}"'/' \
+        DESCRIPTION
+      head DESCRIPTION
+      rm -f DESCRIPTION.bak

Review Comment:
   arrow-r-nightly has this, which is useful in order to be able to build binaries for the release branch. I'm not sure that this is the right way to trigger this, but adding this condition would allow us to support it. (If nothing else, we should leave a TODO here noting that we will need a way to build without appending a date to the version.)
   
   ```suggestion
         if [ "{{ date }}" != "" ]; then
           sed -i.bak -E -e \
             's/(^Version: [0-9]+\.[0-9]+\.[0-9]+).*$/\1.'"{{ date }}"'/' \
             DESCRIPTION
           head DESCRIPTION
           rm -f DESCRIPTION.bak
         fi
   ```



##########
dev/tasks/macros.jinja:
##########
@@ -221,3 +222,35 @@ on:
         cp ${formula} $(brew --repository homebrew/core)/Formula/
       done
 {% endmacro %}
+
+{%- macro change_r_pkg_version(date = '$(date +%Y%m%d)') -%}
+  - name: Modify version
+    shell: bash
+    run: |
+      cd arrow/r
+      sed -i.bak -E -e \
+        's/(^Version: [0-9]+\.[0-9]+\.[0-9]+).*$/\1.'"{{ date }}"'/' \
+        DESCRIPTION
+      head DESCRIPTION
+      rm -f DESCRIPTION.bak
+      cp ../dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb tools/apache-arrow.rb
+      
+      # Pin the git commit in the formula to match
+      cd tools
+      sed -i.bak -E -e 's/arrow.git"$/arrow.git", :revision => "'"{{ arrow.head }}"'"/' apache-arrow.rb
+      rm -f apache-arrow.rb.bak
+{% endmacro %}
+
+{%- macro test_r_src_pkg() -%}
+  source("https://raw.githubusercontent.com/apache/arrow/master/ci/etc/rprofile")
+  options(arrow.dev_repo = "https://nightly.wujciak.de/r")

Review Comment:
   This repo URL should be set outside of here (either as a macro or some other global setting). And it should not be your personal server.



##########
dev/tasks/macros.jinja:
##########
@@ -221,3 +222,35 @@ on:
         cp ${formula} $(brew --repository homebrew/core)/Formula/
       done
 {% endmacro %}
+
+{%- macro change_r_pkg_version(date = '$(date +%Y%m%d)') -%}
+  - name: Modify version
+    shell: bash
+    run: |
+      cd arrow/r
+      sed -i.bak -E -e \
+        's/(^Version: [0-9]+\.[0-9]+\.[0-9]+).*$/\1.'"{{ date }}"'/' \
+        DESCRIPTION
+      head DESCRIPTION
+      rm -f DESCRIPTION.bak
+      cp ../dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb tools/apache-arrow.rb
+      
+      # Pin the git commit in the formula to match
+      cd tools
+      sed -i.bak -E -e 's/arrow.git"$/arrow.git", :revision => "'"{{ arrow.head }}"'"/' apache-arrow.rb
+      rm -f apache-arrow.rb.bak
+{% endmacro %}
+
+{%- macro test_r_src_pkg() -%}
+  source("https://raw.githubusercontent.com/apache/arrow/master/ci/etc/rprofile")

Review Comment:
   Since we're running in the arrow checkout here, we should have this file locally and don't need https:// to get it.



##########
dev/tasks/r/github.nightly.yml:
##########
@@ -0,0 +1,381 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+{% import 'macros.jinja' as macros with context %}
+{% set r_root = "nightly/r" %}
+{% set webdav_domain = "https://nightly.wujciak.de:8080" %}
+{% set repo_domain = "https://nightly.wujciak.de/r" %}
+
+{{ macros.github_header() }}
+
+jobs:
+  source:
+    name: Source Package
+    runs-on: ubuntu-latest
+    outputs:
+      version: {{ '${{ steps.save-version.outputs.version }}' }}
+      date: {{ '${{ steps.save-version.outputs.date }}' }}
+    steps:
+      {{ macros.github_checkout_arrow()|indent }}
+      {{ macros.change_r_pkg_version()|indent }}
+      - name: Save Version
+        id: save-version
+        shell: bash
+        run: | 
+          echo "::set-output name=version::$(grep ^Version arrow/r/DESCRIPTION | sed s/Version:\ //)"
+          echo "::set-output name=date::$(date +%Y%m%d)"
+
+      - uses: r-lib/actions/setup-r@v2
+        with:
+          install-r: false
+
+      - name: Build R source package
+        shell: bash
+        run: |
+          cd arrow/r
+          # Copy in the Arrow C++ source
+          make sync-cpp
+          R CMD build --no-build-vignettes .
+
+      - name: Install davfs2 & Mount Repo

Review Comment:
   Instead of using webdav to test locally, how about a workflow that is like:
   
   * jobs to build source and binary packages, store in github artifacts
   * jobs to test source and binary installation, which start with steps (macro this out) that download the artifacts, put them in the right directories for an R repo, and use `python3 -m http.server $port` to serve the repo locally, and set `options(arrow.dev_repo = "http://localhost:$port")`
   * if `nightly` (and those succeed), job to upload to webdav (and possibly test installing from there with `options(arrow.dev_repo = "https://nightlies.apache.org/arrow/r")` or whatever
   
   We can't have PRs push to a remote nightly server to test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org