You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "chong0929 (via GitHub)" <gi...@apache.org> on 2023/03/08 16:28:02 UTC

[GitHub] [spark] chong0929 opened a new pull request, #40341: [SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

chong0929 opened a new pull request, #40341:
URL: https://github.com/apache/spark/pull/40341

   ### What changes were proposed in this pull request?
   In orc batch read, the byte arrays is used to store the data of the read columns. When the total data of this batch exceeds Int.MaxValue can be caused NegativeArraySizeException, catch and throw the same exeception with a friendly msg.
   
   
   ### Why are the changes needed?
   Friendly msg where read orc file get exception about java.lang.NegativeArraySizeException.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   Existing tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] LuciferYang commented on a diff in pull request #40341: [SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #40341:
URL: https://github.com/apache/spark/pull/40341#discussion_r1130576248


##########
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java:
##########
@@ -204,7 +204,12 @@ public void initBatch(
    * by copying from ORC VectorizedRowBatch columns to Spark ColumnarBatch columns.
    */
   private boolean nextBatch() throws IOException {
-    recordReader.nextBatch(wrap.batch());
+    try {

Review Comment:
   Will Parquet have the same issue?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] zzzzming95 commented on a diff in pull request #40341: [SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

Posted by "zzzzming95 (via GitHub)" <gi...@apache.org>.
zzzzming95 commented on code in PR #40341:
URL: https://github.com/apache/spark/pull/40341#discussion_r1131247723


##########
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java:
##########
@@ -204,7 +204,12 @@ public void initBatch(
    * by copying from ORC VectorizedRowBatch columns to Spark ColumnarBatch columns.
    */
   private boolean nextBatch() throws IOException {
-    recordReader.nextBatch(wrap.batch());
+    try {
+      recordReader.nextBatch(wrap.batch());
+    } catch (NegativeArraySizeException e) {

Review Comment:
   Is there a way to build unit test and catch the exception?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] chong0929 commented on a diff in pull request #40341: [SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

Posted by "chong0929 (via GitHub)" <gi...@apache.org>.
chong0929 commented on code in PR #40341:
URL: https://github.com/apache/spark/pull/40341#discussion_r1131896943


##########
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java:
##########
@@ -204,7 +204,12 @@ public void initBatch(
    * by copying from ORC VectorizedRowBatch columns to Spark ColumnarBatch columns.
    */
   private boolean nextBatch() throws IOException {
-    recordReader.nextBatch(wrap.batch());
+    try {

Review Comment:
   Thoughtful, i will make a test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] smallzhongfeng commented on a diff in pull request #40341: [WIP][SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

Posted by "smallzhongfeng (via GitHub)" <gi...@apache.org>.
smallzhongfeng commented on code in PR #40341:
URL: https://github.com/apache/spark/pull/40341#discussion_r1142974426


##########
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java:
##########
@@ -204,7 +204,12 @@ public void initBatch(
    * by copying from ORC VectorizedRowBatch columns to Spark ColumnarBatch columns.
    */
   private boolean nextBatch() throws IOException {
-    recordReader.nextBatch(wrap.batch());
+    try {
+      recordReader.nextBatch(wrap.batch());
+    } catch (NegativeArraySizeException e) {

Review Comment:
   I also encountered the same stack issue. How much adjustment would be appropriate. @chong0929 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] closed pull request #40341: [WIP][SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #40341: [WIP][SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException
URL: https://github.com/apache/spark/pull/40341


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #40341: [WIP][SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #40341:
URL: https://github.com/apache/spark/pull/40341#issuecomment-1613947367

   We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] chong0929 commented on a diff in pull request #40341: [SPARK-42715][SQL] Tips for Optimizing NegativeArraySizeException

Posted by "chong0929 (via GitHub)" <gi...@apache.org>.
chong0929 commented on code in PR #40341:
URL: https://github.com/apache/spark/pull/40341#discussion_r1131900349


##########
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java:
##########
@@ -204,7 +204,12 @@ public void initBatch(
    * by copying from ORC VectorizedRowBatch columns to Spark ColumnarBatch columns.
    */
   private boolean nextBatch() throws IOException {
-    recordReader.nextBatch(wrap.batch());
+    try {
+      recordReader.nextBatch(wrap.batch());
+    } catch (NegativeArraySizeException e) {

Review Comment:
   Thanks for your ideas, they sound nice, i will make it done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org