You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/29 14:45:44 UTC

[GitHub] [spark] LuciferYang opened a new pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

LuciferYang opened a new pull request #31995:
URL: https://github.com/apache/spark/pull/31995


   ### What changes were proposed in this pull request?
   Some `spark-submit`  commands used to run benchmarks in the user's guide is wrong, we can't use these commands to run benchmarks successful.
   
   So the major changes of this pr is correct these wrong commands, for example, run a benchmark which inherits from `SqlBasedBenchmark`, we must specify `--jars <spark core test jar>,<spark catalyst test jar>` because `SqlBasedBenchmark` based benchmark extends `BenchmarkBase(defined in spark core test jar)` and `SQLHelper(defined in spark catalyst test jar)`.
   
   Another change of this pr is removed the scalatest Assertions dependency of Benchmarks because `scalatest-*.jar` ars not in the distribution package, it will be troublesome to use.
   
   ### Why are the changes needed?
   Make sure benchmarks can run using spark-submit cmd described in the guide
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Use the corrected `spark-submit` commands to run benchmarks successfully.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31995:
URL: https://github.com/apache/spark/pull/31995#issuecomment-809730493


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41245/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #31995:
URL: https://github.com/apache/spark/pull/31995#discussion_r603363423



##########
File path: mllib/src/test/scala/org/apache/spark/mllib/linalg/UDTSerializationBenchmark.scala
##########
@@ -24,7 +24,9 @@ import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
  * Serialization benchmark for VectorUDT.
  * To run this benchmark:
  * {{{
- * 1. without sbt: bin/spark-submit --class <this class> <spark mllib test jar>
+ * 1. without sbt:
+ *    bin/spark-submit --class <this class>
+ *      --jars <spark core test jar> <spark mllib test jar>

Review comment:
       --jars need includes `spark core test jar` because BenchmarkBase defined in this module.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #31995:
URL: https://github.com/apache/spark/pull/31995#discussion_r603407358



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InsertTableWithDynamicPartitionsBenchmark.scala
##########
@@ -23,7 +23,9 @@ import org.apache.spark.benchmark.Benchmark
  * Benchmark to measure insert into table with dynamic partition columns.
  * To run this benchmark:
  * {{{
- *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class>
+ *        --jars <spark core test jar>,<spark catalyst test jar> < spark sql test jar>

Review comment:
       nit: `< spark sql test jar>` -> `<spark sql test jar>`

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/PrimitiveArrayBenchmark.scala
##########
@@ -23,7 +23,9 @@ import org.apache.spark.sql.SparkSession
 /**
  * Benchmark primitive arrays via DataFrame and Dataset program using primitive arrays
  * To run this benchmark:
- * 1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
+ * 1. without sbt:
+ *    bin/spark-submit --class <this class>
+ *      --jars <spark core test jar>,<spark catalyst test jar> < spark sql test jar>

Review comment:
       nit: `< spark sql test jar>` -> `<spark sql test jar>`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #31995:
URL: https://github.com/apache/spark/pull/31995#discussion_r603736602



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/InsertTableWithDynamicPartitionsBenchmark.scala
##########
@@ -23,7 +23,9 @@ import org.apache.spark.benchmark.Benchmark
  * Benchmark to measure insert into table with dynamic partition columns.
  * To run this benchmark:
  * {{{
- *   1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class>
+ *        --jars <spark core test jar>,<spark catalyst test jar> < spark sql test jar>

Review comment:
       done 

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/PrimitiveArrayBenchmark.scala
##########
@@ -23,7 +23,9 @@ import org.apache.spark.sql.SparkSession
 /**
  * Benchmark primitive arrays via DataFrame and Dataset program using primitive arrays
  * To run this benchmark:
- * 1. without sbt: bin/spark-submit --class <this class> <spark sql test jar>
+ * 1. without sbt:
+ *    bin/spark-submit --class <this class>
+ *      --jars <spark core test jar>,<spark catalyst test jar> < spark sql test jar>

Review comment:
       done 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31995:
URL: https://github.com/apache/spark/pull/31995#issuecomment-809442074


   **[Test build #136663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136663/testReport)** for PR 31995 at commit [`c76be67`](https://github.com/apache/spark/commit/c76be6774612ec2b95d03127538da37445a76083).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31995:
URL: https://github.com/apache/spark/pull/31995#issuecomment-809699780


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41245/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31995:
URL: https://github.com/apache/spark/pull/31995#issuecomment-809494673


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136663/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31995:
URL: https://github.com/apache/spark/pull/31995#issuecomment-809718619


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41245/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #31995:
URL: https://github.com/apache/spark/pull/31995#issuecomment-809963351


   > @LuciferYang can we create backporting PRs?
   
   OK ~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #31995:
URL: https://github.com/apache/spark/pull/31995


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #31995:
URL: https://github.com/apache/spark/pull/31995#discussion_r603362500



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala
##########
@@ -32,7 +32,7 @@ import org.apache.spark.sql.types._
  * {{{
  *   To run this benchmark:
  *   1. without sbt: bin/spark-submit --class <this class>
- *        --jars <catalyst test jar>,<core test jar>,<spark-avro jar> <avro test jar>
+ *        --jars <catalyst test jar>,<core test jar>,<sql test jar>,<spark-avro jar> <avro test jar>

Review comment:
       `--jars` need includes <sql test jar> because `SqlBasedBenchmark` defined in this module.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #31995:
URL: https://github.com/apache/spark/pull/31995#issuecomment-809479035


   > Thanks. All of these are tested manually, @LuciferYang ?
   
   Yes, I tested all the benchmarks manually. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31995:
URL: https://github.com/apache/spark/pull/31995#issuecomment-809730493


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41245/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #31995:
URL: https://github.com/apache/spark/pull/31995#discussion_r603362500



##########
File path: external/avro/src/test/scala/org/apache/spark/sql/execution/benchmark/AvroReadBenchmark.scala
##########
@@ -32,7 +32,7 @@ import org.apache.spark.sql.types._
  * {{{
  *   To run this benchmark:
  *   1. without sbt: bin/spark-submit --class <this class>
- *        --jars <catalyst test jar>,<core test jar>,<spark-avro jar> <avro test jar>
+ *        --jars <catalyst test jar>,<core test jar>,<sql test jar>,<spark-avro jar> <avro test jar>

Review comment:
       `--jars` need includes `sql test jar` because `SqlBasedBenchmark` defined in this module.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a change in pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on a change in pull request #31995:
URL: https://github.com/apache/spark/pull/31995#discussion_r603361122



##########
File path: core/src/test/scala/org/apache/spark/MapStatusesSerDeserBenchmark.scala
##########
@@ -28,7 +26,7 @@ import org.apache.spark.storage.BlockManagerId
  * Benchmark for MapStatuses serialization & deserialization performance.
  * {{{
  *   To run this benchmark:
- *   1. without sbt: bin/spark-submit --class <this class> --jars <core test jar>
+ *   1. without sbt: bin/spark-submit --class <this class> <spark core test jar>

Review comment:
       use command
   ```
   bin/spark-submit --class org.apache.spark.MapStatusesSerDeserBenchmark --jars spark-core_2.12-3.2.0-SNAPSHOT-tests.jar 
   Error: Missing application resource.
   ```
   should remove `--jars` because  `MapStatusesSerDeserBenchmark` run in local mode.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang edited a comment on pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
LuciferYang edited a comment on pull request #31995:
URL: https://github.com/apache/spark/pull/31995#issuecomment-809479035


   > Thanks. All of these are tested manually, @LuciferYang ?
   
   Yes, I tested all benchmarks manually. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on pull request #31995:
URL: https://github.com/apache/spark/pull/31995#issuecomment-810216500


   > @LuciferYang can we create backporting PRs?
   
   @HyukjinKwon 
   
   - branch-3.1: https://github.com/apache/spark/pull/32002
   - branch-3.0: https://github.com/apache/spark/pull/32003
   
   Still need backporting to 2.4?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31995:
URL: https://github.com/apache/spark/pull/31995#issuecomment-809494673


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136663/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #31995: [SPARK-34900][TEST] Make sure benchmarks can run using spark-submit cmd described in the guide

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #31995:
URL: https://github.com/apache/spark/pull/31995#issuecomment-809867659


   Merged to master.
   
   @LuciferYang can we create backporting PRs?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org