You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/28 16:53:31 UTC

[GitHub] [spark] kazuyukitanimura opened a new pull request, #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

kazuyukitanimura opened a new pull request, #37020:
URL: https://github.com/apache/spark/pull/37020

   ### What changes were proposed in this pull request?
   Currently `org.apache.spark.sql.execution.benchmark.TPCDSQueryBenchmark` cannot start via Github Actions. That is due to the extra required `--data-location` argument.
   
   This PR proposes to provide an option to pass the argument by `SPARK_TPCDS_DATA` environment variable. This is the same approach taken by `TPCDSQueryTestSuite`. This PR also adds the Github workflow step to generate the TPC-DS data and set the `SPARK_TPCDS_DATA` location.
   Developers still can use `--data-location` argument to `TPCDSQueryBenchmark` just like before for manual testing.
   
   This PR is a refinement of [the earlier attempt](https://github.com/apache/spark/pull/33544)
   
   
   ### Why are the changes needed?
   Currently `TPCDSQueryBenchmark` is excluded from benchmark runs via Github Actions. Therefore, benchmark results e.g. TPCDSQueryBenchmark-results.txt are not updated regularly and obsolete.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Tested on  Github Actions.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1169562877

   > Need to add a `TPCDSQueryBenchmark-jdk17-results.txt` file @kazuyukitanimura
   
   Thanks @LuciferYang Added it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks
URL: https://github.com/apache/spark/pull/37020


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1174623205

   I am fine if it works.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1169468277

   Need to add a `TPCDSQueryBenchmark-jdk17-results.txt` file
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1175830680

   Sorry, there's a problem I didn't notice before.
   
   I found that `Generate an input dataset for TPCDSQueryBenchmark with SF=1`always be executed even if I didn't specify to run `TPCDSQueryBenchmark`, Is this what we want?  For example
   
   https://github.com/LuciferYang/spark/runs/7207453029?check_suite_focus=true
   
   I just want to run `MapStatusesConvertBenchmark`.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1175433010

   Thank you Dongjoon! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1175901522

   We should of course skip that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1175285111

   Thank you @dongjoon-hyun @wangyum @LuciferYang @HyukjinKwon for the review. Could you please help me with merging this if there are no more feedbacks?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1176918916

   Sure @HyukjinKwon I will try your approach


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1168987767

   cc @MyeongKim @HyukjinKwon @pingsutw @maropu @wangyum


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
LuciferYang commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1175904553

   > 
   
   I will file a new Jira first 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] kazuyukitanimura commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
kazuyukitanimura commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1176508666

   Thanks @LuciferYang I thought the cache would work okay, but I agree that it is more ideal to skip the entire step based on the glob results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37020: [SPARK-34927][INFRA] Support TPCDSQueryBenchmark in Benchmarks

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37020:
URL: https://github.com/apache/spark/pull/37020#issuecomment-1176872006

   @kazuyukitanimura would you mind taking a look? I think we can at least skip it when the input "Benchmark class" doesn't have `*` but it's not TPC-DS which should be the most common cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org