You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/12/23 13:06:37 UTC

[GitHub] [spark] yym1995 opened a new pull request #35002: [SPARK-37728][SQL] reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

yym1995 opened a new pull request #35002:
URL: https://github.com/apache/spark/pull/35002


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   When an OrcColumnarBatchReader is created, method initBatch will be called only once. In method initBatch:
   
   `orcVectorWrappers[i] = OrcColumnVectorUtils.toOrcColumnVector(dt, wrap.batch().cols[colId]);`
   
   When the second argument of toOrcColumnVector is a ListColumnVector/MapColumnVector, orcVectorWrappers[i] is initialized with the ListColumnVector or MapColumnVector's offsets and lengths. 
   
   However, when method nextBatch of OrcColumnarBatchReader is called, method ensureSize of ColumnVector (and its subclasses, like MultiValuedColumnVector) could be called, then the ListColumnVector/MapColumnVector's offsets and lengths could refer to new array objects. This could result in the ArrayIndexOutOfBoundsException. 
   
   This PR makes OrcArrayColumnVector.getArray and OrcMapColumnVector.getMap always get offsets and lengths from the underlying ColumnVector, which can resolve this issue.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   Bugfix
   
   ### Does this PR introduce _any_ user-facing change?
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   Tested manually. Just follow the bug reproduction steps in https://issues.apache.org/jira/browse/SPARK-37728 .
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001522608


   The fix LGTM. Please click the failed Github Action, follow the instructions to fix the issues.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun closed pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun closed pull request #35002:
URL: https://github.com/apache/spark/pull/35002


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yym1995 commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
yym1995 commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001302948


   @c21 Thank you for the feedback! I have already changed the code structure.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yym1995 commented on pull request #35002: [SPARK-37728][SQL] reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
yym1995 commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000307071


   @c21  Could you take a look when you are free? Thanks! Looking forward to your feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000665700


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51025/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001887178


   I revised the PR description. Merged to master.
   Thank you, @yym1995 and all!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yym1995 commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
yym1995 commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001885585


   Now all checks have passed. cc @cloud-fan @dongjoon-hyun @HyukjinKwon @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000608258






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya edited a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
viirya edited a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001740353


   GA seems unstable now. You can submit an empty commit to re-trigger it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001887807


   Welcome to the Apache Spark community, @yym1995 . 
   I added you to the Apache Spark contributor group and assigned SPARK-37728 to you.
   Could you make two backporting PRs to `branch-3.2` and `branch-3.1`? We need to pass the UTs on those branches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000654425


   **[Test build #146550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146550/testReport)** for PR 35002 at commit [`ec2e0db`](https://github.com/apache/spark/commit/ec2e0db0244a64b7cbc7a8325d20fadb55f956c3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000639245


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51020/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35002: [SPARK-37728][SQL] reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000292699


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000764117


   **[Test build #146550 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146550/testReport)** for PR 35002 at commit [`ec2e0db`](https://github.com/apache/spark/commit/ec2e0db0244a64b7cbc7a8325d20fadb55f956c3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000292699


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yym1995 commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
yym1995 commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000645608


   > @yym1995 thank you submitting a fix! Could you help add a unit test case as well?
   
   I just added a unit test case. Please take a look, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000674075


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000686492


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51025/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000654425


   **[Test build #146550 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146550/testReport)** for PR 35002 at commit [`ec2e0db`](https://github.com/apache/spark/commit/ec2e0db0244a64b7cbc7a8325d20fadb55f956c3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001887807


   Welcome to the Apache Spark community, @yym1995 . 
   I added you to the Apache Spark contributor group and assigned SPARK-37728 to you.
   Could you make backporting PRs to `branch-3.2` and `branch-3.1`? We need to pass the UTs on those branches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on a change in pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
c21 commented on a change in pull request #35002:
URL: https://github.com/apache/spark/pull/35002#discussion_r775301600



##########
File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcMapColumnVector.java
##########
@@ -32,28 +33,30 @@
 public class OrcMapColumnVector extends OrcColumnVector {
   private final OrcColumnVector keys;
   private final OrcColumnVector values;
-  private final long[] offsets;
-  private final long[] lengths;
+  private MapColumnVector mapData;
 
   OrcMapColumnVector(
       DataType type,
       ColumnVector vector,
       OrcColumnVector keys,
-      OrcColumnVector values,
-      long[] offsets,
-      long[] lengths) {
+      OrcColumnVector values) {
 
     super(type, vector);
 
     this.keys = keys;
     this.values = values;
-    this.offsets = offsets;
-    this.lengths = lengths;
+
+    if (vector instanceof MapColumnVector) {
+      mapData = (MapColumnVector) vector;
+    } else {
+      throw new UnsupportedOperationException();
+    }
   }
 
   @Override
   public ColumnarMap getMap(int ordinal) {
-    return new ColumnarMap(keys, values, (int) offsets[ordinal], (int) lengths[ordinal]);
+    return new ColumnarMap(keys, values, (int) mapData.offsets[ordinal],

Review comment:
       ```java
   public ColumnarMap getMap(int ordinal) {
     int offset = (int) ((MapColumnVector) baseData).offsets[ordinal];
     int length = (int) ((MapColumnVector) baseData).lengths[ordinal];
     return new ColumnarMap(keys, values, offset, length);
   }
   ```

##########
File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcArrayColumnVector.java
##########
@@ -31,26 +32,27 @@
  */
 public class OrcArrayColumnVector extends OrcColumnVector {
   private final OrcColumnVector data;
-  private final long[] offsets;
-  private final long[] lengths;
+  private ListColumnVector listData;
 
   OrcArrayColumnVector(

Review comment:
       I think we don't need to store another copy of `vector` here. We can change [`OrcColumnVector.baseData`](https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnVector.java#L32) from `private` to `protected`, and just use `baseData` to get offset and length. So how about defining the constructor like below:
   
   ```java
   OrcArrayColumnVector(
       DataType type,
       ListColumnVector vector,
       OrcColumnVector data) {
   
     super(type, vector);
   
     this.data = data;
   }
   ```

##########
File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcMapColumnVector.java
##########
@@ -32,28 +33,30 @@
 public class OrcMapColumnVector extends OrcColumnVector {
   private final OrcColumnVector keys;
   private final OrcColumnVector values;
-  private final long[] offsets;
-  private final long[] lengths;
+  private MapColumnVector mapData;
 
   OrcMapColumnVector(

Review comment:
       Similarly we can define `OrcMapColumnVector` constructor as below:
   
   ```java
   OrcMapColumnVector(
       DataType type,
       MapColumnVector vector,
       OrcColumnVector keys,
       OrcColumnVector values) {
   
     super(type, vector);
   
     this.keys = keys;
     this.values = values;
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001030918


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001740353


   GA seems unstable. You can submit an empty commit to re-trigger it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000767198


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146550/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000767198


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146550/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #35002: [SPARK-37728][SQL] reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000519891


   @yym1995 thank you submitting a fix! Could you help add a unit test case as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000639229


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51020/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yym1995 edited a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
yym1995 edited a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000645608


   > @yym1995 thank you submitting a fix! Could you help add a unit test case as well?
   
   @c21 I just added a unit test case. Please take a look, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on a change in pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
c21 commented on a change in pull request #35002:
URL: https://github.com/apache/spark/pull/35002#discussion_r775301890



##########
File path: sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcArrayColumnVector.java
##########
@@ -31,26 +32,27 @@
  */
 public class OrcArrayColumnVector extends OrcColumnVector {
   private final OrcColumnVector data;
-  private final long[] offsets;
-  private final long[] lengths;
+  private ListColumnVector listData;
 
   OrcArrayColumnVector(
       DataType type,
       ColumnVector vector,
-      OrcColumnVector data,
-      long[] offsets,
-      long[] lengths) {
+      OrcColumnVector data) {
 
     super(type, vector);
 
     this.data = data;
-    this.offsets = offsets;
-    this.lengths = lengths;
+
+    if (vector instanceof ListColumnVector) {
+      listData = (ListColumnVector) vector;
+    } else {
+      throw new UnsupportedOperationException();
+    }
   }
 
   @Override
   public ColumnarArray getArray(int rowId) {
-    return new ColumnarArray(data, (int) offsets[rowId], (int) lengths[rowId]);
+    return new ColumnarArray(data, (int) listData.offsets[rowId], (int) listData.lengths[rowId]);

Review comment:
       We can just use `baseData` here:
   
   ```java
   public ColumnarArray getArray(int rowId) {
     int offset = (int) ((ListColumnVector) baseData).offsets[rowId];
     int length = (int) ((ListColumnVector) baseData).lengths[rowId];
     return new ColumnarArray(data, offset, length);
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001919239


   Yes, we only need to backport to the branch which the test case fails.
   
   BTW, @yym1995 , if `branch-3.2` is the only branch, you should remove `Affected Version` of SPARK-37728. Currently, it has 3.0.2 and 3.1.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001030918


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000682808


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51025/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000674075






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000639245


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/51020/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000624597


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/51020/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001919239


   Yes, we only need to backport to the branches where the test case fails.
   
   BTW, @yym1995 , if `branch-3.2` is the only branch, you should remove `Affected Version` of SPARK-37728. Currently, it has 3.0.2 and 3.1.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001740733


   > Tested manually. Just follow the bug reproduction steps in https://issues.apache.org/jira/browse/SPARK-37728 .
   
   As you added an unit test, you can modify the PR description too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun edited a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun edited a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001887807


   Welcome to the Apache Spark community, @yym1995 . 
   I added you to the Apache Spark contributor group and assigned SPARK-37728 to you.
   Could you make three backporting PRs to `branch-3.2` and `branch-3.1` and `branch-3.0`? We need to pass the UTs on those branches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000694542


   **[Test build #146545 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146545/testReport)** for PR 35002 at commit [`f975af9`](https://github.com/apache/spark/commit/f975af95c921ab8944f19e941403bf10b58518a3).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000608978


   **[Test build #146545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146545/testReport)** for PR 35002 at commit [`f975af9`](https://github.com/apache/spark/commit/f975af95c921ab8944f19e941403bf10b58518a3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000708387


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146545/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000708387


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/146545/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1000608978


   **[Test build #146545 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/146545/testReport)** for PR 35002 at commit [`f975af9`](https://github.com/apache/spark/commit/f975af95c921ab8944f19e941403bf10b58518a3).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yym1995 commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
yym1995 commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001305766


   @dongjoon-hyun This PR fixed a bug in ORC vectorized reader. @c21 has reviewed this PR, and I have improved the code according to the feedback. I was wondering if you could merge this PR, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001306156


   LGTM with pending CI tests. cc @viirya and @cloud-fan as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001306344


   retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
c21 commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001916895


   @dongjoon-hyun - thanks for merging. We only need backport to branch-3.2, right? The related code was only introduced in 3.2 branch (in PR for ORC vectorized reader of nested column)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yym1995 commented on pull request #35002: [SPARK-37728][SQL] Reading nested columns with ORC vectorized reader can cause ArrayIndexOutOfBoundsException

Posted by GitBox <gi...@apache.org>.
yym1995 commented on pull request #35002:
URL: https://github.com/apache/spark/pull/35002#issuecomment-1001890648


   > Welcome to the Apache Spark community, @yym1995 . I added you to the Apache Spark contributor group and assigned [SPARK-37728](https://issues.apache.org/jira/browse/SPARK-37728) to you. Could you make three backporting PRs to `branch-3.2` and `branch-3.1` and `branch-3.0`? We need to pass the UTs on those branches.
   
   OK, will do.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org