You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/05/14 20:11:18 UTC

[GitHub] [iceberg] kbendick opened a new pull request #2595: Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

kbendick opened a new pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595


   Right now, we have JMH benchmarks to allow performance comparisons between the native spark sources / sinks vs the iceberg sources and sinks in spark.
   
   However, these JMH test suites don't get registered for Spark 3.
   
   I have moved the JMH test code to `spark/src/jmh/...` and then added that to the src directories for those projects.
   
   Now, depending on which JDK one is using, you can run either Spark 2 tests (with JDK 8) or Spark 3 tests (with JDK 11).
   
   It would still be possible to add specific benchmarks to Spark 3 or Spark 2 by placing them in `spark[2|3]/src/jmh/...`.
   
   For now, all of the code is left as is and has been copied over (with the command to run updated in each file). I've also added a `.gitkeep` file in `spark2/benchmark` and `spark3/benchmark` so that users can run the commands as is without having to create the folders.
   
   Based on the way the tests are currently run, it would be difficult to create the folder and the given name per test without a larger refactoring, and I wanted to keep this change as small as possible.
   
   This closes https://github.com/apache/iceberg/issues/2590


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595#issuecomment-858082399


   > > @kbendick got this error
   > > ```
   > > Execution failed for task ':iceberg-spark2:jmh'.
   > > > A failure occurred while executing me.champeau.gradle.IsolatedRunner
   > >    > Error during execution of benchmarks
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > When i ran
   > > `g jmh`
   > > on the pr
   > 
   > So just like the current set up (as far as I'm aware), you have to specify the actual test to be run. So something like the following, as mentioned in the existing code [here](https://github.com/apache/iceberg/pull/2595/files#diff-ed844daf9dae5835b7ed247c21439af720df778c28596c644a9da8d7013f4fb9L56-L58)
   > 
   > ```
   > ./gradlew :iceberg-spark2:jmh \ 
   >    -PjmhIncludeRegex=SparkParquetWritersNestedDataBenchmark \
   >    -PjmhOutputPath=benchmark/spark-parquet-writers-nested-data-benchmark-result.txt
   > ```
   > 
   > I will test simply `./gradlew jmh` on the current state of master vs my PR, but with my PR, the tests run the same way as mentioned in the doc comments of each test. @RussellSpitzer
   
   I tried running `./gradlew jmh` both from master and from my branch. Both of them failed eventually, but after running for 20-25 minutes and completing at least 3-4 of the benchmarks each time. @RussellSpitzer has now been able to get plain `./gradle jmh` to run for over an hour now on this PR branch.
   
   When I run an individual test, as recommended in the comments of each test (or above), it runs just fine to completion.
   
   I think it's more an issue with gradle JVM settings and then running out of memory or other resources, possibly with the number of forked threads when running all of the benchmarks for all of the requested iterations.
   
   **TLDR:** I don't think we should run `jmh` itself to run _all_ tests without better configuring gradle. But stick to running the tests as they're mentioned in the test comments. This does not introduce any new issues with the JMH tests, other than possibly the added number of tests if one runs `./gradlew jmh` by itself (which is not guaranteed to run to completion anyway with the current state in master).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595#discussion_r648602486



##########
File path: jmh.gradle
##########
@@ -17,7 +17,14 @@
  * under the License.
  */
 
-def jmhProjects = [ project("iceberg-spark2") ]
+def jmhProjects

Review comment:
       Yeah. I could use `deef jmhProjects = []`, but IntelliJ complains about unused variable.
   
   I then tried to be a bit more Scala like, and did `def jmhProjects = { ... return array_of_what_i_want }` but that generated a closure.
   
   Basically, the best I think I can offer is `def jmhProjects = []` and then leave it like that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
flyrain commented on pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595#issuecomment-860891810


   Hi @kbendick , sorry for the delayed review. Both spark2 and spark3 jmh commands works well in my laptop with your PR. For Spark3, I can do with both jdk8 and jdk11. Looks good to me overall. One question, is it possible to let gradle create dirs(`spark2/benchmark` and `spark3/benchmark`) instead of using `.gitkeep` files?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595#issuecomment-858022876


   > @kbendick got this error
   > 
   > ```
   > Execution failed for task ':iceberg-spark2:jmh'.
   > > A failure occurred while executing me.champeau.gradle.IsolatedRunner
   >    > Error during execution of benchmarks
   > ```
   > 
   > When i ran
   > `g jmh`
   > 
   > on the pr
   
   So just like the current set up (as far as I'm aware), you have to specify the actual test to be run. So something like the following, as mentioned in the existing code [here](https://github.com/apache/iceberg/pull/2595/files#diff-ed844daf9dae5835b7ed247c21439af720df778c28596c644a9da8d7013f4fb9L56-L58)
   ```
   ./gradlew :iceberg-spark2:jmh \ 
      -PjmhIncludeRegex=SparkParquetWritersNestedDataBenchmark \
      -PjmhOutputPath=benchmark/spark-parquet-writers-nested-data-benchmark-result.txt
   ```
   
   I will test simply `./gradlew jmh` on the current state of master vs my PR, but with my PR, the tests run the same way as mentioned in the doc comments of each test. @RussellSpitzer 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595#discussion_r646225786



##########
File path: jmh.gradle
##########
@@ -17,7 +17,14 @@
  * under the License.
  */
 
-def jmhProjects = [ project("iceberg-spark2") ]
+def jmhProjects = []
+if (JavaVersion.current() == JavaVersion.VERSION_1_8) {
+  jmhProjects = [ project("iceberg-spark2") ]

Review comment:
       Alright. This has been fixed :). Confirmed by setting my terminals JDK to java 8, then running both a spark 2 and spark 3 JMH test, then setting to Java 11, and running a Spark 3 test (and back again). 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595#discussion_r646214298



##########
File path: jmh.gradle
##########
@@ -17,7 +17,14 @@
  * under the License.
  */
 
-def jmhProjects = [ project("iceberg-spark2") ]
+def jmhProjects = []
+if (JavaVersion.current() == JavaVersion.VERSION_1_8) {
+  jmhProjects = [ project("iceberg-spark2") ]

Review comment:
       Ah. Confirmed the issue is 100% with my own understanding of groovy / gradle (and always wanting it to be more like Scala).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer merged pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
RussellSpitzer merged pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595#discussion_r646208206



##########
File path: jmh.gradle
##########
@@ -17,7 +17,14 @@
  * under the License.
  */
 
-def jmhProjects = [ project("iceberg-spark2") ]
+def jmhProjects = []
+if (JavaVersion.current() == JavaVersion.VERSION_1_8) {
+  jmhProjects = [ project("iceberg-spark2") ]

Review comment:
       It would indeed make sense.
   
   I was having trouble getting both projects to work at one time, so I chose to just skip it as that's about as far as my gradle abilities got me. I tried adding to the array, etc, but the plugin didn't like that.
   
   I'll look into it further to see if it can be supported. Thanks for the review!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595#discussion_r646209179



##########
File path: jmh.gradle
##########
@@ -17,7 +17,14 @@
  * under the License.
  */
 
-def jmhProjects = [ project("iceberg-spark2") ]
+def jmhProjects = []
+if (JavaVersion.current() == JavaVersion.VERSION_1_8) {
+  jmhProjects = [ project("iceberg-spark2") ]

Review comment:
       Looking into it again, I think probably the issue was that I didn't run `./gradlew clean` when switching between JDKs locally. Pushing a fix now!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595#issuecomment-856128048


   @kbendick  got this error
   ```* What went wrong:
   Execution failed for task ':iceberg-spark2:jmh'.
   > A failure occurred while executing me.champeau.gradle.IsolatedRunner
      > Error during execution of benchmarks
      ```
      
      When i ran 
      ```g jmh```
      
      on the pr


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595#discussion_r646794138



##########
File path: jmh.gradle
##########
@@ -17,7 +17,14 @@
  * under the License.
  */
 
-def jmhProjects = [ project("iceberg-spark2") ]
+def jmhProjects

Review comment:
       The formatting here is a bit odd but other than this, looks good to me




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer merged pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
RussellSpitzer merged pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] holdenk commented on a change in pull request #2595: [SPARK] Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
holdenk commented on a change in pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595#discussion_r646182902



##########
File path: jmh.gradle
##########
@@ -17,7 +17,14 @@
  * under the License.
  */
 
-def jmhProjects = [ project("iceberg-spark2") ]
+def jmhProjects = []
+if (JavaVersion.current() == JavaVersion.VERSION_1_8) {
+  jmhProjects = [ project("iceberg-spark2") ]

Review comment:
       I _think_ Spark 3 supports JDK 1.8, would it make sense to do iceberg-spark3 here as well? Or is there a reason for only doing one of them?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on pull request #2595: Allow existing spark2 JMH benchmarks to work with either spark 2 or spark 3

Posted by GitBox <gi...@apache.org>.
kbendick commented on pull request #2595:
URL: https://github.com/apache/iceberg/pull/2595#issuecomment-841477504


   cc @aokolnychyi @RussellSpitzer @flyrain @karuppayya @raptond @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org