You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/05 07:59:32 UTC

[GitHub] [spark] cxzl25 opened a new pull request, #36769: [SPARK-39381][SQL] Make vectorized orc columar writer batch size configurable

cxzl25 opened a new pull request, #36769:
URL: https://github.com/apache/spark/pull/36769

   ### What changes were proposed in this pull request?
   Introduce configuration items and set batch size when constructing orc writer.
   
   ### Why are the changes needed?
   Now vectorized columar orc writer batch size is default 1024.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   exist UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36769: [SPARK-39381][SQL] Make vectorized orc columar writer batch size configurable

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #36769:
URL: https://github.com/apache/spark/pull/36769#discussion_r895128188


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -844,6 +844,21 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       df.write.format("orc").save(path)
     }
   }
+
+  test("SPARK-39387: Make vectorized orc columar writer batch size configurable") {

Review Comment:
   ID looks wrong to me. Could you double-check? I expected SPARK-39381.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cxzl25 commented on a diff in pull request #36769: [SPARK-39381][SQL] Make vectorized orc columar writer batch size configurable

Posted by GitBox <gi...@apache.org>.

cxzl25 commented on code in PR #36769:
URL: https://github.com/apache/spark/pull/36769#discussion_r895128848


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -844,6 +844,21 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       df.write.format("orc").save(path)
     }
   }
+
+  test("SPARK-39387: Make vectorized orc columar writer batch size configurable") {

Review Comment:
   Thanks, i fixed it now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #36769: [SPARK-39381][SQL] Make vectorized orc columar writer batch size configurable

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun closed pull request #36769: [SPARK-39381][SQL] Make vectorized orc columar writer batch size configurable
URL: https://github.com/apache/spark/pull/36769


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36769: [SPARK-39381][SQL] Make vectorized orc columar writer batch size configurable

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on PR #36769:
URL: https://github.com/apache/spark/pull/36769#issuecomment-1146784621

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cxzl25 commented on a diff in pull request #36769: [SPARK-39381][SQL] Make vectorized orc columar writer batch size configurable

Posted by GitBox <gi...@apache.org>.

cxzl25 commented on code in PR #36769:
URL: https://github.com/apache/spark/pull/36769#discussion_r895125836


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -844,6 +844,21 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       df.write.format("orc").save(path)
     }
   }
+
+  test("SPARK-39387: Make vectorized orc columar writer batch size configurable") {
+    Seq(10, 100).foreach(batchSize => {
+      withSQLConf(SQLConf.ORC_VECTORIZED_WRITER_BATCH_SIZE.key -> batchSize.toString) {

Review Comment:
   If the default value of 1024 is used here, although intercet Throwable, OOM causes the spark process to exit abnormally, and the test will not succeed.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org