You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2020/11/05 05:59:09 UTC

[spark] branch branch-3.0 updated: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new 14eb8b16 [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job
14eb8b16 is described below

commit 14eb8b164df5fdb3715b7212ba3f5b2e88ec7c53
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Wed Nov 4 21:56:21 2020 -0800

    [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job
    
    ### What changes were proposed in this pull request?
    
    This is a backport of https://github.com/apache/spark/pull/30066 .
    
    This PR aims to use a pre-built image for Github Action SparkR job.
    
    ### Why are the changes needed?
    
    This will reduce the execution time and the flakiness.
    
    **BEFORE (branch-3.0: 21 minutes 7 seconds)**
    ![Screen Shot 2020-11-04 at 8 53 50 PM](https://user-images.githubusercontent.com/9700541/98199386-e39a1b80-1edf-11eb-8dec-c6819ebb3f0d.png)
    
    **AFTER**
    No R and R package installation steps.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Pass the GitHub Action `sparkr` job in this PR.
    
    Closes #30258 from dongjoon-hyun/SPARK-33239-3.0.
    
    Authored-by: Dongjoon Hyun <dh...@apple.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 .github/workflows/build_and_test.yml | 79 ++++++++++++++++++++++++++++--------
 1 file changed, 63 insertions(+), 16 deletions(-)

diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml
index 7956d9e..9b4f41a 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -37,8 +37,6 @@ jobs:
             streaming, sql-kafka-0-10, streaming-kafka-0-10,
             mllib-local, mllib,
             yarn, mesos, kubernetes, hadoop-cloud, spark-ganglia-lgpl
-          - >-
-            sparkr
         # Here, we split Hive and SQL tests into some of slow ones and the rest of them.
         included-tags: [""]
         # Some tests are disabled in GitHun Actions. Ideally, we should remove this tag
@@ -131,20 +129,6 @@ jobs:
       run: |
         python3.8 -m pip install numpy 'pyarrow<3.0.0' pandas scipy xmlrunner
         python3.8 -m pip list
-    # SparkR
-    - name: Install R 4.0
-      uses: r-lib/actions/setup-r@v1
-      if: contains(matrix.modules, 'sparkr')
-      with:
-        r-version: 4.0
-    - name: Install R packages
-      if: contains(matrix.modules, 'sparkr')
-      run: |
-        # qpdf is required to reduce the size of PDFs to make CRAN check pass. See SPARK-32497.
-        sudo apt-get install -y libcurl4-openssl-dev qpdf
-        sudo Rscript -e "install.packages(c('knitr', 'rmarkdown', 'testthat', 'devtools', 'e1071', 'survival', 'arrow', 'roxygen2'), repos='https://cloud.r-project.org/')"
-        # Show installed packages in R.
-        sudo Rscript -e 'pkg_list <- as.data.frame(installed.packages()[, c(1,3:4)]); pkg_list[is.na(pkg_list$Priority), 1:2, drop = FALSE]'
     # Run the tests.
     - name: Run tests
       run: |
@@ -246,6 +230,69 @@ jobs:
         name: unit-tests-log-${{ matrix.modules }}--1.8-hadoop2.7-hive2.3
         path: "**/target/unit-tests.log"
 
+  sparkr:
+    name: Build modules - sparkr
+    runs-on: ubuntu-20.04
+    container:
+       image: dongjoon/apache-spark-github-action-image:20201025
+    env:
+      HADOOP_PROFILE: hadoop2.7
+      HIVE_PROFILE: hive2.3
+      GITHUB_PREV_SHA: ${{ github.event.before }}
+    steps:
+    - name: Checkout Spark repository
+      uses: actions/checkout@v2
+      # In order to fetch changed files
+      with:
+        fetch-depth: 0
+    # Cache local repositories. Note that GitHub Actions cache has a 2G limit.
+    - name: Cache Scala, SBT, Maven and Zinc
+      uses: actions/cache@v2
+      with:
+        path: |
+          build/apache-maven-*
+          build/zinc-*
+          build/scala-*
+          build/*.jar
+        key: build-${{ hashFiles('**/pom.xml', 'project/build.properties', 'build/mvn', 'build/sbt', 'build/sbt-launch-lib.bash', 'build/spark-build-info') }}
+        restore-keys: |
+          build-
+    - name: Cache Maven local repository
+      uses: actions/cache@v2
+      with:
+        path: ~/.m2/repository
+        key: sparkr-maven-${{ hashFiles('**/pom.xml') }}
+        restore-keys: |
+          sparkr-maven-
+    - name: Cache Ivy local repository
+      uses: actions/cache@v2
+      with:
+        path: ~/.ivy2/cache
+        key: sparkr-ivy-${{ hashFiles('**/pom.xml', '**/plugins.sbt') }}
+        restore-keys: |
+          sparkr-ivy-
+    - name: Run tests
+      run: |
+        mkdir -p ~/.m2
+        # The followings are also used by `r-lib/actions/setup-r` to avoid
+        # R issues at docker environment
+        export TZ=UTC
+        export _R_CHECK_SYSTEM_CLOCK_=FALSE
+        ./dev/run-tests --parallelism 2 --modules sparkr
+        rm -rf ~/.m2/repository/org/apache/spark
+    - name: Upload test results to report
+      if: always()
+      uses: actions/upload-artifact@v2
+      with:
+        name: test-results-sparkr--1.8-hadoop2.7-hive2.3
+        path: "**/target/test-reports/*.xml"
+    - name: Upload unit tests log files
+      if: failure()
+      uses: actions/upload-artifact@v2
+      with:
+        name: unit-tests-log-sparkr--1.8-hadoop2.7-hive2.3
+        path: "**/target/unit-tests.log"
+
   # Static analysis, and documentation build
   lint:
     name: Linters, licenses, dependencies and documentation generation


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org