You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/10/21 08:29:51 UTC

[GitHub] [spark] RabbidHY opened a new pull request #34353: Set spark.sql.files.openCostInBytes to bytesConf

RabbidHY opened a new pull request #34353:
URL: https://github.com/apache/spark/pull/34353


   ### What changes were proposed in this pull request?
   
   Set `spark.sql.files.openCostInBytes` to bytesConf.
   
   ### Why are the changes needed?
   
   The name is _*InBytes_, but it actually only accepts **long type**. This is confusing for users. After the changes, it can accept **String** as input which is more flexible to users.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Existing test.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-950038749


   **[Test build #144550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144550/testReport)** for PR 34353 at commit [`4003f0c`](https://github.com/apache/spark/commit/4003f0c01f769e48b98573247439d1d15248d082).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on a change in pull request #34353: Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on a change in pull request #34353:
URL: https://github.com/apache/spark/pull/34353#discussion_r733449140



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1415,8 +1415,8 @@ object SQLConf {
       " bigger files (which is scheduled first). This configuration is effective only when using" +
       " file-based sources such as Parquet, JSON and ORC.")
     .version("2.0.0")
-    .longConf
-    .createWithDefault(4 * 1024 * 1024)
+    .bytesConf(ByteUnit.BYTE)
+    .createWithDefaultString("4MB")

Review comment:
       Can we add a simple test at `ConfigEntrySuite` to make sure byte configuration is able to take the long type numbers?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] JoshRosen commented on a change in pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
JoshRosen commented on a change in pull request #34353:
URL: https://github.com/apache/spark/pull/34353#discussion_r733910512



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1415,8 +1415,8 @@ object SQLConf {
       " bigger files (which is scheduled first). This configuration is effective only when using" +
       " file-based sources such as Parquet, JSON and ORC.")
     .version("2.0.0")
-    .longConf
-    .createWithDefault(4 * 1024 * 1024)
+    .bytesConf(ByteUnit.BYTE)
+    .createWithDefaultString("4MB")

Review comment:
       +1; we could probably extend this existing test:
   
   https://github.com/apache/spark/blob/4148fb58aada5bb7bc4835b39fe1baa07f9bacce/core/src/test/scala/org/apache/spark/internal/config/ConfigEntrySuite.scala#L90-L97




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34353: Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-948389308


   Let's also file a JIRA, see also https://spark.apache.org/contributing.html


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-950038749


   **[Test build #144550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144550/testReport)** for PR 34353 at commit [`4003f0c`](https://github.com/apache/spark/commit/4003f0c01f769e48b98573247439d1d15248d082).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-948391980


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-950063381


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144550/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-950063381


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/144550/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-950041897


   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-950043823


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49021/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-950056971


   **[Test build #144550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/144550/testReport)** for PR 34353 at commit [`4003f0c`](https://github.com/apache/spark/commit/4003f0c01f769e48b98573247439d1d15248d082).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34353: Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-948391980


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-950048633


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49021/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-950051491


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49021/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-950051491


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49021/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-950035199


   ok to test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #34353:
URL: https://github.com/apache/spark/pull/34353


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34353: [SPARK-37084][SQL] Set spark.sql.files.openCostInBytes to bytesConf

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34353:
URL: https://github.com/apache/spark/pull/34353#issuecomment-950119420


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org