You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/21 11:27:48 UTC

[GitHub] [spark] HyukjinKwon opened a new pull request, #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

HyukjinKwon opened a new pull request, #36940:
URL: https://github.com/apache/spark/pull/36940

   ### What changes were proposed in this pull request?
   
   This PR borrows the idea from https://github.com/apache/spark/pull/36928 but adds some more changes in order for scheduled jobs to share the `precondition` so all conditional logic is consolidated here.
   
   This PR also adds a new option to `is-changed.py` so dependent modules can be checked together. In this way, we don't have to change `build_and_test.yml` often when we add a new module.
   
   Lastly, this PR removes `type` because `precondition` job now replaces it.
   
   ### Why are the changes needed?
   
   To make it easier to read.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, dev-only.
   
   ### How was this patch tested?
   
   Being tested in my fork and local


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on PR #36940:
URL: https://github.com/apache/spark/pull/36940#issuecomment-1163272120

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36940:
URL: https://github.com/apache/spark/pull/36940#issuecomment-1162956984

   Will merge this tomorrow morning in my time (KST) .... so I can monitor and fix it if something goes wrong ... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] EnricoMi commented on a diff in pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
EnricoMi commented on code in PR #36940:
URL: https://github.com/apache/spark/pull/36940#discussion_r902668952


##########
.github/workflows/build_java11.yml:
##########
@@ -32,9 +32,16 @@ jobs:
       java: 11
       branch: master
       hadoop: hadoop3
-      type: scheduled
       envs: >-
         {
           "SKIP_MIMA": "true",
           "SKIP_UNIDOC": "true"
         }
+      jobs: >-
+        {
+          "build": "true",
+          "pyspark": "true",

Review Comment:
   job `pyspark` did not used to run for Java 11:
   
   ```
   inputs.type == 'scheduled' && inputs.java == '17'
   ```



##########
.github/workflows/build_hadoop2.yml:
##########
@@ -32,4 +32,11 @@ jobs:
       java: 8
       branch: master
       hadoop: hadoop2
-      type: scheduled
+      jobs: >-
+        {
+          "build": "true",
+          "pyspark": "true",
+          "sparkr": "true",

Review Comment:
   job `sparkr` did not used to run for scheduled Java 8:
   
   ```
   inputs.type == 'scheduled' && inputs.java == '17'
   ```



##########
.github/workflows/build_and_test.yml:
##########
@@ -67,27 +73,42 @@ jobs:
     - name: Check all modules
       id: set-outputs
       run: |
-        # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
-        build=true; pyspark=true; sparkr=true; tpcds=true; docker=true;
-        if [ -f "./dev/is-changed.py" ]; then
-          build=`./dev/is-changed.py -m avro,build,catalyst,core,docker-integration-tests,examples,graphx,hadoop-cloud,hive,hive-thriftserver,kubernetes,kvstore,launcher,mesos,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,spark-ganglia-lgpl,sparkr,sql,sql-kafka-0-10,streaming,streaming-kafka-0-10,streaming-kinesis-asl,tags,unsafe,yarn`
-          pyspark=`./dev/is-changed.py -m avro,build,catalyst,core,graphx,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,sql,tags,unsafe`
-          sparkr=`./dev/is-changed.py -m avro,build,catalyst,core,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,repl,sketch,sparkr,sql,tags,unsafe`
-          tpcds=`./dev/is-changed.py -m build,catalyst,core,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
-          docker=`./dev/is-changed.py -m build,catalyst,core,docker-integration-tests,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
+        if [ -z "${{ inputs.jobs }}" ]; then
+          # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
+          pyspark=true; sparkr=true; tpcds=true; docker=true;
+          if [ -f "./dev/is-changed.py" ]; then
+            pyspark_modules=`python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
+            pyspark=`./dev/is-changed.py -c -m $pyspark_modules`
+            sparkr=`./dev/is-changed.py -c -m sparkr`
+            tpcds=`./dev/is-changed.py -c -m sql`
+            docker=`./dev/is-changed.py -c -m docker-integration-tests`
+          fi
+          # 'build', 'scala-213', and 'java-11-17' are always true for now.
+          # It dose not save significant time and most of PRs trigger the build.
+          precondition="
+            {
+              \"build\": \"true\",
+              \"pyspark\": \"$pyspark\",
+              \"sparkr\": \"$sparkr\",
+              \"tpcds-1g\": \"$tpcds\",
+              \"docker-integration-tests\": \"$docker\",
+              \"scala-213\": \"true\",
+              \"java-11-17\": \"true\",
+              \"lint\" : \"true\"
+            }"

Review Comment:
   this is bash, so you can replace the outer `"` with `'` and get rid of the escaped quotes:
   ```suggestion
             precondition='
               {
                 "build": "true",
                 "pyspark": "$pyspark",
                 "sparkr": "$sparkr",
                 "tpcds-1g": "$tpcds",
                 "docker-integration-tests": "$docker",
                 "scala-213": "true",
                 "java-11-17": "true",
                 "lint" : "true"
               }'
   ```



##########
.github/workflows/build_hadoop2.yml:
##########
@@ -32,4 +32,11 @@ jobs:
       java: 8
       branch: master
       hadoop: hadoop2
-      type: scheduled
+      jobs: >-
+        {
+          "build": "true",
+          "pyspark": "true",

Review Comment:
   job `pyspark` did not used to run for scheduled Java 8:
   
   ```
   inputs.type == 'scheduled' && inputs.java == '17'
   ```



##########
.github/workflows/build_java11.yml:
##########
@@ -32,9 +32,16 @@ jobs:
       java: 11
       branch: master
       hadoop: hadoop3
-      type: scheduled
       envs: >-
         {
           "SKIP_MIMA": "true",
           "SKIP_UNIDOC": "true"
         }
+      jobs: >-
+        {
+          "build": "true",
+          "pyspark": "true",
+          "sparkr": "true",

Review Comment:
   job `sparkr` did not used to run for Java 11:
   
   ```
   (inputs.type == 'scheduled' && inputs.java == '17')
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition
URL: https://github.com/apache/spark/pull/36940


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36940:
URL: https://github.com/apache/spark/pull/36940#issuecomment-1161626590

   cc @dongjoon-hyun @Yikun @gengliangwang @EnricoMi @MaxGekk FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36940:
URL: https://github.com/apache/spark/pull/36940#discussion_r903555095


##########
.github/workflows/build_and_test.yml:
##########
@@ -67,27 +73,47 @@ jobs:
     - name: Check all modules
       id: set-outputs
       run: |
-        # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
-        build=true; pyspark=true; sparkr=true; tpcds=true; docker=true;
-        if [ -f "./dev/is-changed.py" ]; then
-          build=`./dev/is-changed.py -m avro,build,catalyst,core,docker-integration-tests,examples,graphx,hadoop-cloud,hive,hive-thriftserver,kubernetes,kvstore,launcher,mesos,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,spark-ganglia-lgpl,sparkr,sql,sql-kafka-0-10,streaming,streaming-kafka-0-10,streaming-kinesis-asl,tags,unsafe,yarn`
-          pyspark=`./dev/is-changed.py -m avro,build,catalyst,core,graphx,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,sql,tags,unsafe`
-          sparkr=`./dev/is-changed.py -m avro,build,catalyst,core,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,repl,sketch,sparkr,sql,tags,unsafe`
-          tpcds=`./dev/is-changed.py -m build,catalyst,core,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
-          docker=`./dev/is-changed.py -m build,catalyst,core,docker-integration-tests,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
+        if [ -z "${{ inputs.jobs }}" ]; then
+          # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
+          pyspark=true; sparkr=true; tpcds=true; docker=true;
+          if [ -f "./dev/is-changed.py" ]; then
+            pyspark_modules=`cd dev && python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
+            pyspark=`./dev/is-changed.py -m $pyspark_modules`
+            sparkr=`./dev/is-changed.py -m sparkr`
+            tpcds=`./dev/is-changed.py -m sql`

Review Comment:
   In fact, we don't need any change in the script. The script already checks the dependent modules.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] EnricoMi commented on pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
EnricoMi commented on PR #36940:
URL: https://github.com/apache/spark/pull/36940#issuecomment-1162415553

   I found it best to test by removing `if "root" in test_modules` from `dev/is-changed.py`, so that changes from this PR do are not considered by the CI in this PR. Then, by triggering some changes in spark or pyspark should just trigger those to be built.
   
   The scheduled jobs are best to be tested by declaring this branch as the default branch, so that GitHub runs them as defined here.
   
   Besides that, it can only be tested from master, sadly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36940:
URL: https://github.com/apache/spark/pull/36940#issuecomment-1162459034

   > BTW, how do we validate this PR?
   
   I will test locally, and try to run the scheduled jobs in my fork with some manual changes (to allow running in my fork). That's not perfect but would still test reasonably enough before getting this in.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36940:
URL: https://github.com/apache/spark/pull/36940#discussion_r903376866


##########
.github/workflows/build_java11.yml:
##########
@@ -32,9 +32,16 @@ jobs:
       java: 11
       branch: master
       hadoop: hadoop3
-      type: scheduled
       envs: >-
         {
           "SKIP_MIMA": "true",
           "SKIP_UNIDOC": "true"
         }
+      jobs: >-
+        {
+          "build": "true",
+          "pyspark": "true",
+          "sparkr": "true",

Review Comment:
   Yup, I should have noted it in PR description. thanks for pointing it out. This PR enables several tests more.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36940:
URL: https://github.com/apache/spark/pull/36940#discussion_r903441614


##########
.github/workflows/build_and_test.yml:
##########
@@ -67,27 +73,42 @@ jobs:
     - name: Check all modules
       id: set-outputs
       run: |
-        # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
-        build=true; pyspark=true; sparkr=true; tpcds=true; docker=true;
-        if [ -f "./dev/is-changed.py" ]; then
-          build=`./dev/is-changed.py -m avro,build,catalyst,core,docker-integration-tests,examples,graphx,hadoop-cloud,hive,hive-thriftserver,kubernetes,kvstore,launcher,mesos,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,spark-ganglia-lgpl,sparkr,sql,sql-kafka-0-10,streaming,streaming-kafka-0-10,streaming-kinesis-asl,tags,unsafe,yarn`
-          pyspark=`./dev/is-changed.py -m avro,build,catalyst,core,graphx,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,sql,tags,unsafe`
-          sparkr=`./dev/is-changed.py -m avro,build,catalyst,core,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,repl,sketch,sparkr,sql,tags,unsafe`
-          tpcds=`./dev/is-changed.py -m build,catalyst,core,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
-          docker=`./dev/is-changed.py -m build,catalyst,core,docker-integration-tests,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
+        if [ -z "${{ inputs.jobs }}" ]; then
+          # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
+          pyspark=true; sparkr=true; tpcds=true; docker=true;
+          if [ -f "./dev/is-changed.py" ]; then
+            pyspark_modules=`cd dev && python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
+            pyspark=`./dev/is-changed.py -c -m $pyspark_modules`
+            sparkr=`./dev/is-changed.py -c -m sparkr`
+            tpcds=`./dev/is-changed.py -c -m sql`
+            docker=`./dev/is-changed.py -c -m docker-integration-tests`
+          fi
+          # 'build', 'scala-213', and 'java-11-17' are always true for now.
+          # It does not save significant time and most of PRs trigger the build.
+          precondition="
+            {
+              \"build\": \"true\",
+              \"pyspark\": \"$pyspark\",
+              \"sparkr\": \"$sparkr\",
+              \"tpcds-1g\": \"$tpcds\",
+              \"docker-integration-tests\": \"$docker\",
+              \"scala-213\": \"true\",
+              \"java-11-17\": \"true\",
+              \"lint\" : \"true\"
+            }"
+          echo $precondition # For debugging
+          echo "::set-output name=required::$precondition"

Review Comment:
   oops, I saw this after pushing some changes. Thanks for pointing this out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] EnricoMi commented on a diff in pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
EnricoMi commented on code in PR #36940:
URL: https://github.com/apache/spark/pull/36940#discussion_r903409215


##########
.github/workflows/build_and_test.yml:
##########
@@ -67,27 +73,42 @@ jobs:
     - name: Check all modules
       id: set-outputs
       run: |
-        # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
-        build=true; pyspark=true; sparkr=true; tpcds=true; docker=true;
-        if [ -f "./dev/is-changed.py" ]; then
-          build=`./dev/is-changed.py -m avro,build,catalyst,core,docker-integration-tests,examples,graphx,hadoop-cloud,hive,hive-thriftserver,kubernetes,kvstore,launcher,mesos,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,spark-ganglia-lgpl,sparkr,sql,sql-kafka-0-10,streaming,streaming-kafka-0-10,streaming-kinesis-asl,tags,unsafe,yarn`
-          pyspark=`./dev/is-changed.py -m avro,build,catalyst,core,graphx,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,sql,tags,unsafe`
-          sparkr=`./dev/is-changed.py -m avro,build,catalyst,core,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,repl,sketch,sparkr,sql,tags,unsafe`
-          tpcds=`./dev/is-changed.py -m build,catalyst,core,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
-          docker=`./dev/is-changed.py -m build,catalyst,core,docker-integration-tests,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
+        if [ -z "${{ inputs.jobs }}" ]; then
+          # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
+          pyspark=true; sparkr=true; tpcds=true; docker=true;
+          if [ -f "./dev/is-changed.py" ]; then
+            pyspark_modules=`cd dev && python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
+            pyspark=`./dev/is-changed.py -c -m $pyspark_modules`
+            sparkr=`./dev/is-changed.py -c -m sparkr`
+            tpcds=`./dev/is-changed.py -c -m sql`
+            docker=`./dev/is-changed.py -c -m docker-integration-tests`
+          fi
+          # 'build', 'scala-213', and 'java-11-17' are always true for now.
+          # It does not save significant time and most of PRs trigger the build.
+          precondition="
+            {
+              \"build\": \"true\",
+              \"pyspark\": \"$pyspark\",
+              \"sparkr\": \"$sparkr\",
+              \"tpcds-1g\": \"$tpcds\",
+              \"docker-integration-tests\": \"$docker\",
+              \"scala-213\": \"true\",
+              \"java-11-17\": \"true\",
+              \"lint\" : \"true\"
+            }"
+          echo $precondition # For debugging
+          echo "::set-output name=required::$precondition"

Review Comment:
   the problem here is that `$precondition` is a multiline string, which is not supported by `set-output`.
   ```suggestion
             echo "::set-output name=required::$(jq -c . <<<"$precondition")"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36940:
URL: https://github.com/apache/spark/pull/36940#issuecomment-1163598207

   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] EnricoMi commented on a diff in pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
EnricoMi commented on code in PR #36940:
URL: https://github.com/apache/spark/pull/36940#discussion_r902735413


##########
.github/workflows/build_and_test.yml:
##########
@@ -67,27 +73,42 @@ jobs:
     - name: Check all modules
       id: set-outputs
       run: |
-        # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
-        build=true; pyspark=true; sparkr=true; tpcds=true; docker=true;
-        if [ -f "./dev/is-changed.py" ]; then
-          build=`./dev/is-changed.py -m avro,build,catalyst,core,docker-integration-tests,examples,graphx,hadoop-cloud,hive,hive-thriftserver,kubernetes,kvstore,launcher,mesos,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,spark-ganglia-lgpl,sparkr,sql,sql-kafka-0-10,streaming,streaming-kafka-0-10,streaming-kinesis-asl,tags,unsafe,yarn`
-          pyspark=`./dev/is-changed.py -m avro,build,catalyst,core,graphx,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,sql,tags,unsafe`
-          sparkr=`./dev/is-changed.py -m avro,build,catalyst,core,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,repl,sketch,sparkr,sql,tags,unsafe`
-          tpcds=`./dev/is-changed.py -m build,catalyst,core,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
-          docker=`./dev/is-changed.py -m build,catalyst,core,docker-integration-tests,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
+        if [ -z "${{ inputs.jobs }}" ]; then
+          # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
+          pyspark=true; sparkr=true; tpcds=true; docker=true;
+          if [ -f "./dev/is-changed.py" ]; then
+            pyspark_modules=`python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`

Review Comment:
   ```suggestion
               pyspark_modules=`cd dev; python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] EnricoMi commented on a diff in pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
EnricoMi commented on code in PR #36940:
URL: https://github.com/apache/spark/pull/36940#discussion_r903409215


##########
.github/workflows/build_and_test.yml:
##########
@@ -67,27 +73,42 @@ jobs:
     - name: Check all modules
       id: set-outputs
       run: |
-        # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
-        build=true; pyspark=true; sparkr=true; tpcds=true; docker=true;
-        if [ -f "./dev/is-changed.py" ]; then
-          build=`./dev/is-changed.py -m avro,build,catalyst,core,docker-integration-tests,examples,graphx,hadoop-cloud,hive,hive-thriftserver,kubernetes,kvstore,launcher,mesos,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,spark-ganglia-lgpl,sparkr,sql,sql-kafka-0-10,streaming,streaming-kafka-0-10,streaming-kinesis-asl,tags,unsafe,yarn`
-          pyspark=`./dev/is-changed.py -m avro,build,catalyst,core,graphx,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,sql,tags,unsafe`
-          sparkr=`./dev/is-changed.py -m avro,build,catalyst,core,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,repl,sketch,sparkr,sql,tags,unsafe`
-          tpcds=`./dev/is-changed.py -m build,catalyst,core,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
-          docker=`./dev/is-changed.py -m build,catalyst,core,docker-integration-tests,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
+        if [ -z "${{ inputs.jobs }}" ]; then
+          # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
+          pyspark=true; sparkr=true; tpcds=true; docker=true;
+          if [ -f "./dev/is-changed.py" ]; then
+            pyspark_modules=`cd dev && python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
+            pyspark=`./dev/is-changed.py -c -m $pyspark_modules`
+            sparkr=`./dev/is-changed.py -c -m sparkr`
+            tpcds=`./dev/is-changed.py -c -m sql`
+            docker=`./dev/is-changed.py -c -m docker-integration-tests`
+          fi
+          # 'build', 'scala-213', and 'java-11-17' are always true for now.
+          # It does not save significant time and most of PRs trigger the build.
+          precondition="
+            {
+              \"build\": \"true\",
+              \"pyspark\": \"$pyspark\",
+              \"sparkr\": \"$sparkr\",
+              \"tpcds-1g\": \"$tpcds\",
+              \"docker-integration-tests\": \"$docker\",
+              \"scala-213\": \"true\",
+              \"java-11-17\": \"true\",
+              \"lint\" : \"true\"
+            }"
+          echo $precondition # For debugging
+          echo "::set-output name=required::$precondition"

Review Comment:
   the problem here is that `$precondition` is a multi-line string, which is not supported by `set-output`.
   
   This makes the multi-line JSON into a single line:
   ```suggestion
             echo "::set-output name=required::$(jq -c . <<< "$precondition")"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #36940:
URL: https://github.com/apache/spark/pull/36940#issuecomment-1162917131

   I tested all the possible scenarios. should be good to go. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] EnricoMi commented on a diff in pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
EnricoMi commented on code in PR #36940:
URL: https://github.com/apache/spark/pull/36940#discussion_r902683629


##########
.github/workflows/build_scala213.yml:
##########
@@ -32,8 +32,16 @@ jobs:
       java: 8
       branch: master
       hadoop: hadoop3
-      type: scheduled
       envs: >-
         {
           "SCALA_PROFILE": "scala2.13"
         }
+      jobs: >-
+        {
+          "build": "true",
+          "pyspark": "true",
+          "sparkr": "true",
+          "tpcds-1g": "true",
+          "docker-integration-tests": "true",
+          "lint" : "true"

Review Comment:
   the `lint` job only used to run for regular jobs:
   
   ```
       if: inputs.type == 'regular'
   ```
   



##########
.github/workflows/build_scala213.yml:
##########
@@ -32,8 +32,16 @@ jobs:
       java: 8
       branch: master
       hadoop: hadoop3
-      type: scheduled
       envs: >-
         {
           "SCALA_PROFILE": "scala2.13"
         }
+      jobs: >-
+        {
+          "build": "true",
+          "pyspark": "true",
+          "sparkr": "true",
+          "tpcds-1g": "true",
+          "docker-integration-tests": "true",

Review Comment:
   the `tpcds-1g"` and `docker-integration-tests` jobs only used to run for regular jobs and when build was required (holds for all scheduled job workfiles):
   
   ```
   inputs.type == 'regular' && fromJson(needs.precondition.outputs.required).build == 'true'
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on code in PR #36940:
URL: https://github.com/apache/spark/pull/36940#discussion_r903376278


##########
.github/workflows/build_and_test.yml:
##########
@@ -67,27 +73,42 @@ jobs:
     - name: Check all modules
       id: set-outputs
       run: |
-        # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
-        build=true; pyspark=true; sparkr=true; tpcds=true; docker=true;
-        if [ -f "./dev/is-changed.py" ]; then
-          build=`./dev/is-changed.py -m avro,build,catalyst,core,docker-integration-tests,examples,graphx,hadoop-cloud,hive,hive-thriftserver,kubernetes,kvstore,launcher,mesos,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,spark-ganglia-lgpl,sparkr,sql,sql-kafka-0-10,streaming,streaming-kafka-0-10,streaming-kinesis-asl,tags,unsafe,yarn`
-          pyspark=`./dev/is-changed.py -m avro,build,catalyst,core,graphx,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,sql,tags,unsafe`
-          sparkr=`./dev/is-changed.py -m avro,build,catalyst,core,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,repl,sketch,sparkr,sql,tags,unsafe`
-          tpcds=`./dev/is-changed.py -m build,catalyst,core,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
-          docker=`./dev/is-changed.py -m build,catalyst,core,docker-integration-tests,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
+        if [ -z "${{ inputs.jobs }}" ]; then
+          # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
+          pyspark=true; sparkr=true; tpcds=true; docker=true;
+          if [ -f "./dev/is-changed.py" ]; then
+            pyspark_modules=`python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
+            pyspark=`./dev/is-changed.py -c -m $pyspark_modules`
+            sparkr=`./dev/is-changed.py -c -m sparkr`
+            tpcds=`./dev/is-changed.py -c -m sql`
+            docker=`./dev/is-changed.py -c -m docker-integration-tests`
+          fi
+          # 'build', 'scala-213', and 'java-11-17' are always true for now.
+          # It dose not save significant time and most of PRs trigger the build.
+          precondition="
+            {
+              \"build\": \"true\",
+              \"pyspark\": \"$pyspark\",
+              \"sparkr\": \"$sparkr\",
+              \"tpcds-1g\": \"$tpcds\",
+              \"docker-integration-tests\": \"$docker\",
+              \"scala-213\": \"true\",
+              \"java-11-17\": \"true\",
+              \"lint\" : \"true\"
+            }"

Review Comment:
   yeah but the env vars won't be interpolated properly ..



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] EnricoMi commented on a diff in pull request #36940: [SPARK-39529][INFRA] Refactor and merge all related job selection logic into precondition

Posted by GitBox <gi...@apache.org>.
EnricoMi commented on code in PR #36940:
URL: https://github.com/apache/spark/pull/36940#discussion_r902691645


##########
.github/workflows/build_and_test.yml:
##########
@@ -67,27 +73,42 @@ jobs:
     - name: Check all modules
       id: set-outputs
       run: |
-        # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
-        build=true; pyspark=true; sparkr=true; tpcds=true; docker=true;
-        if [ -f "./dev/is-changed.py" ]; then
-          build=`./dev/is-changed.py -m avro,build,catalyst,core,docker-integration-tests,examples,graphx,hadoop-cloud,hive,hive-thriftserver,kubernetes,kvstore,launcher,mesos,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,spark-ganglia-lgpl,sparkr,sql,sql-kafka-0-10,streaming,streaming-kafka-0-10,streaming-kinesis-asl,tags,unsafe,yarn`
-          pyspark=`./dev/is-changed.py -m avro,build,catalyst,core,graphx,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,sql,tags,unsafe`
-          sparkr=`./dev/is-changed.py -m avro,build,catalyst,core,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,repl,sketch,sparkr,sql,tags,unsafe`
-          tpcds=`./dev/is-changed.py -m build,catalyst,core,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
-          docker=`./dev/is-changed.py -m build,catalyst,core,docker-integration-tests,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
+        if [ -z "${{ inputs.jobs }}" ]; then
+          # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
+          pyspark=true; sparkr=true; tpcds=true; docker=true;
+          if [ -f "./dev/is-changed.py" ]; then
+            pyspark_modules=`python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
+            pyspark=`./dev/is-changed.py -c -m $pyspark_modules`
+            sparkr=`./dev/is-changed.py -c -m sparkr`
+            tpcds=`./dev/is-changed.py -c -m sql`
+            docker=`./dev/is-changed.py -c -m docker-integration-tests`

Review Comment:
   this is great, no exhaustive enumeration of modules any more...



##########
.github/workflows/build_and_test.yml:
##########
@@ -67,27 +73,42 @@ jobs:
     - name: Check all modules
       id: set-outputs
       run: |
-        # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
-        build=true; pyspark=true; sparkr=true; tpcds=true; docker=true;
-        if [ -f "./dev/is-changed.py" ]; then
-          build=`./dev/is-changed.py -m avro,build,catalyst,core,docker-integration-tests,examples,graphx,hadoop-cloud,hive,hive-thriftserver,kubernetes,kvstore,launcher,mesos,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,spark-ganglia-lgpl,sparkr,sql,sql-kafka-0-10,streaming,streaming-kafka-0-10,streaming-kinesis-asl,tags,unsafe,yarn`
-          pyspark=`./dev/is-changed.py -m avro,build,catalyst,core,graphx,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,pyspark-core,pyspark-ml,pyspark-mllib,pyspark-pandas,pyspark-pandas-slow,pyspark-resource,pyspark-sql,pyspark-streaming,repl,sketch,sql,tags,unsafe`
-          sparkr=`./dev/is-changed.py -m avro,build,catalyst,core,hive,kvstore,launcher,mllib,mllib-local,network-common,network-shuffle,repl,sketch,sparkr,sql,tags,unsafe`
-          tpcds=`./dev/is-changed.py -m build,catalyst,core,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
-          docker=`./dev/is-changed.py -m build,catalyst,core,docker-integration-tests,hive,kvstore,launcher,network-common,network-shuffle,repl,sketch,sql,tags,unsafe`
+        if [ -z "${{ inputs.jobs }}" ]; then
+          # is-changed.py is missing in branch-3.2, and it might run in scheduled build, see also SPARK-39517
+          pyspark=true; sparkr=true; tpcds=true; docker=true;
+          if [ -f "./dev/is-changed.py" ]; then
+            pyspark_modules=`python -c "import sparktestsupport.modules as m; print(','.join(m.name for m in m.all_modules if m.name.startswith('pyspark')))"`
+            pyspark=`./dev/is-changed.py -c -m $pyspark_modules`
+            sparkr=`./dev/is-changed.py -c -m sparkr`
+            tpcds=`./dev/is-changed.py -c -m sql`
+            docker=`./dev/is-changed.py -c -m docker-integration-tests`
+          fi
+          # 'build', 'scala-213', and 'java-11-17' are always true for now.
+          # It dose not save significant time and most of PRs trigger the build.

Review Comment:
   ```suggestion
             # It does not save significant time and most of PRs trigger the build.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org