You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/06/07 12:20:34 UTC

[GitHub] [spark] cxzl25 opened a new pull request, #36787: [SPARK-39387][BUILD][FOLLOWUP] Upgrade hive-storage-api to 2.7.3

cxzl25 opened a new pull request, #36787:
URL: https://github.com/apache/spark/pull/36787

   ### What changes were proposed in this pull request?
   Add UT, test whether the Overflow of newLength problem is fixed.
   
   
   ### Why are the changes needed?
   https://github.com/apache/spark/pull/36772#pullrequestreview-996975725
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   add UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cxzl25 commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

cxzl25 commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r903240714


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: BytesColumnVector should not throw RuntimeException due to overflow") {

Review Comment:
   Thanks for the feedback, I'll see how to optimize its memory usage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cxzl25 commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

cxzl25 commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r891950939


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: Upgrade hive-storage-api to 2.7.3") {

Review Comment:
   > In addition, we can make the test pass with hive-storage-api 2.8.1 too.
   I also tried to upgrade to 2.8.1 yesterday, did a test, and found that some tests of the `OrcFilterSuite` were broken.
   
   Because [HIVE-24458](https://issues.apache.org/jira/browse/HIVE-24458) modifies the output of the `SearchArgument#toString` method, which causes some test comparisons to fail.
   
   Now there are two ways to solve it,
   1 is to modify the expect value of the suite.
   2 is to use `SearchArgumentImpl#toOldString` provided by HIVE-24458.
   
   I have now tried the second.
   
   https://github.com/cxzl25/spark/commit/0adfdf60762faef2450f2f3c54c68f1109c5092e
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36787: [SPARK-39387][BUILD][FOLLOWUP] Upgrade hive-storage-api to 2.7.3

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r891607243


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: Upgrade hive-storage-api to 2.7.3") {

Review Comment:
   I'd expect something like `BytesColumnVector should not throw RuntimeException due to overflow`.
   ```
   Caused by: java.lang.RuntimeException: Overflow of newLength. smallBuffer.length=1073741824, nextElemLength=1048576
    	at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.increaseBufferSpace(BytesColumnVector.java:311)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #36787: [SPARK-39387][BUILD][FOLLOWUP] Upgrade hive-storage-api to 2.7.3

Posted by GitBox <gi...@apache.org>.

wangyum commented on PR #36787:
URL: https://github.com/apache/spark/pull/36787#issuecomment-1148767712

   The 2.7.2 will throw runtime exception:
   ```
   22:38:20.734 ERROR org.apache.spark.util.Utils: Aborting task
   java.lang.RuntimeException: Overflow of newLength. smallBuffer.length=1073741824, nextElemLength=1048576
   	at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.increaseBufferSpace(BytesColumnVector.java:311) ~[hive-storage-api-2.7.2.jar:2.7.2]
   	at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:182) ~[hive-storage-api-2.7.2.jar:2.7.2]
   	at org.apache.orc.mapred.OrcMapredRecordWriter.setBinaryValue(OrcMapredRecordWriter.java:87) ~[orc-mapreduce-1.7.4.jar:1.7.4]
   	at org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:235) ~[orc-mapreduce-1.7.4.jar:1.7.4]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r903411525


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: BytesColumnVector should not throw RuntimeException due to overflow") {

Review Comment:
   @cxzl25 In https://github.com/apache/spark/pull/36954, I am trying to solve the issue by changing JVM options. If you have solved this issue in other ways , please tell me ~ thanks ~
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on PR #36787:
URL: https://github.com/apache/spark/pull/36787#issuecomment-1149442173

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on PR #36787:
URL: https://github.com/apache/spark/pull/36787#issuecomment-1150615271

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r901190636


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: BytesColumnVector should not throw RuntimeException due to overflow") {

Review Comment:
   I saw one test failure with this in JDK 11:
   
   ```
   [info] - SPARK-39387: BytesColumnVector should not throw RuntimeException due to overflow *** FAILED *** (3 seconds, 393 milliseconds)
   [info]   org.apache.spark.SparkException: Job aborted.
   [info]   at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:593)
   [info]   at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:279)
   [info]   at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:186)
   [info]   at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:113)
   [info]   at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:111)
   [info]   at org.apache.spark.sql.execution.command.DataWritingCommandExec.executeCollect(commands.scala:125)
   [info]   at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
   [info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:111)
   [info]   at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:171)
   [info]   at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
   [info]   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
   [info]   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
   [info]   at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
   [info]   at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
   [info]   at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
   [info]   at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
   [info]   at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
   ```
   
   https://github.com/apache/spark/runs/6919076419?check_suite_focus=true
   
   I filed a JIRA at SPARK-39515. Would be great if we can identify if it was just flakiness or actual test failure with JDK 11. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun closed pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190
URL: https://github.com/apache/spark/pull/36787


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cxzl25 commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

cxzl25 commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r891950939


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: Upgrade hive-storage-api to 2.7.3") {

Review Comment:
   > In addition, we can make the test pass with hive-storage-api 2.8.1 too.
   
   I also tried to upgrade to 2.8.1 yesterday, did a test, and found that some tests of the `OrcFilterSuite` were broken.
   
   Because [HIVE-24458](https://issues.apache.org/jira/browse/HIVE-24458) modifies the output of the `SearchArgument#toString` method, which causes some test comparisons to fail.
   
   Now there are two ways to solve it,
   1 is to modify the expect value of the suite.
   2 is to use `SearchArgumentImpl#toOldString` provided by HIVE-24458.
   
   I have now tried the second.
   
   https://github.com/cxzl25/spark/commit/0adfdf60762faef2450f2f3c54c68f1109c5092e
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cxzl25 commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

cxzl25 commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r903574621


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: BytesColumnVector should not throw RuntimeException due to overflow") {

Review Comment:
   This UT requires a large heap, using end-to-end writing orc, and covering tests.
   
   Maybe we can use ignore first.
   
   Based on this comment https://github.com/apache/spark/pull/36772#issuecomment-1147811164, Hive also increased `xmx` to make UT pass.
   
   I tried to reduce the size of each batch write, but because there are still some temporary buffers in the process of spark to orc writing, it may oom.
   And if the oom error is intercepted, the Spark process will exit abnormally, and the ut will fail.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r902056954


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: BytesColumnVector should not throw RuntimeException due to overflow") {

Review Comment:
   Thanks, man!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cxzl25 commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

cxzl25 commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r901191443


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: BytesColumnVector should not throw RuntimeException due to overflow") {

Review Comment:
   Thanks for reporting the problem, I'll look into it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #36787: [SPARK-39387][BUILD][FOLLOWUP] Upgrade hive-storage-api to 2.7.3

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r891605317


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: Upgrade hive-storage-api to 2.7.3") {

Review Comment:
   Please revise the test case name too. A test case name had better be self-describing about the test case body.
   
   For example, we can add a test case as `ignored` test before upgrading `hive-storage-api`. In addition, we can make the test pass with `hive-storage-api` 2.8.1 too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cxzl25 commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

cxzl25 commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r901783862


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: BytesColumnVector should not throw RuntimeException due to overflow") {

Review Comment:
   I tested it with JDK11 locally and it can run successfully.
   ```bash
   setjdk 1.11
   build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.execution.datasources.orc.OrcV1QuerySuite test
   ```
   ![image](https://user-images.githubusercontent.com/3898450/174632802-10abcf43-d1df-4b1b-a8ac-097f240338d6.png)
   
   
   
   I saw the GA error because the writing process encountered OOM, which should have nothing to do with JDK11.
   ```java
   2022-06-16T14:30:19.8285352Z Caused by: java.lang.OutOfMemoryError: Java heap space
   2022-06-16T14:30:19.8285963Z 	at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.allocateBuffer(BytesColumnVector.java:300)
   2022-06-16T14:30:19.8286885Z 	at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.ensureValPreallocated(BytesColumnVector.java:218)
   2022-06-16T14:30:19.8287675Z 	at org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:182)
   2022-06-16T14:30:19.8288377Z 	at org.apache.orc.mapred.OrcMapredRecordWriter.setBinaryValue(OrcMapredRecordWriter.java:87)
   2022-06-16T14:30:19.8289257Z 	at org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:235)
   2022-06-16T14:30:19.8289956Z 	at org.apache.orc.mapred.OrcMapredRecordWriter.setStructValue(OrcMapredRecordWriter.java:133)
   2022-06-16T14:30:19.8290654Z 	at org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:248)
   2022-06-16T14:30:19.8291438Z 	at org.apache.orc.mapred.OrcMapredRecordWriter.setListValue(OrcMapredRecordWriter.java:162)
   2022-06-16T14:30:19.8292127Z 	at org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:256)
   2022-06-16T14:30:19.8292824Z 	at org.apache.orc.mapreduce.OrcMapreduceRecordWriter.write(OrcMapreduceRecordWriter.java:73)
   2022-06-16T14:30:19.8293554Z 	at org.apache.spark.sql.execution.datasources.orc.OrcOutputWriter.write(OrcOutputWriter.scala:56)
   2022-06-16T14:30:19.8294523Z 	at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175)
   ```
   
   This test does not seem to be able to compress buffer memory like PR #34284, it requires a relatively large memory to write to ORC to ensure test coverage.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] LuciferYang commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

LuciferYang commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r903232263


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: BytesColumnVector should not throw RuntimeException due to overflow") {

Review Comment:
   > ```shell
   > ```shell
   > build/mvn -Dtest=none -DwildcardSuites=org.apache.spark.sql.execution.datasources.orc.OrcV1QuerySuite test
   > ```
   > 
   > 
   >     
   >       
   >     
   > 
   >       
   >     
   > 
   >     
   >   
   > ```
   
   
   When mvn test the whole `sql/core` module, it is easy to reproduce OOM on this case, even with Java 8. I think we should consider simplifying the case or increasing the test memory
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r903490345


##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: BytesColumnVector should not throw RuntimeException due to overflow") {

Review Comment:
   Is there any way to avoid allocating such large memory in the test? We can `ignore` it for now too if there isn't an option.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org