You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2016/07/28 07:17:42 UTC

[GitHub] spark pull request #14388: [SPARK-16362][SQL] Support ArrayType and StructTy...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/14388

    [SPARK-16362][SQL] Support ArrayType and StructType in vectorized Parquet reader

    ## What changes were proposed in this pull request?
    
    Vectorization parquet reader now doesn't support complex types such as ArrayType, MapType and StructType. We should support it to extend the coverage of performance improvement introduced by vectorization parquet reader. This patch is to add ArrayType and StructType first.
    
    ### Main changes
    
    * Obtain repetition and definition level information for Parquet schema
    
      In order to support complex types in vectorized Parquet reader, we need to use repetition and definition level information for Parquet schema which are used to encoded the structure of complex types. This PR introduces a class to capture these encoding: `RepetitionDefinitionInfo`. This PR also introduces few classes to capture Parquet schema structure: `ParquetField`, `ParquetStruct`, `ParquetArray` and `ParquetMap`. A new method `getParquetStruct` is added to `ParquetSchemaConverter` which is used to create a `ParquetStruct` object which captures the structure and metadata. The `ParquetStruct` has the same schema structure as the required schema used to guide Parquet reading. It is used to provide the corresponding repetition and definition levels for the fields in the required schema.
    
    * Attach `VectorizedColumnReader`  to `ColumnVector`
    
      Because in flat schema each `ColumnVector` is actually a data column, previously the relation between `VectorizedColumnReader` and `ColumnVector` is one-by-one. Now only the `ColumnVector` representing a data column will have corresponding `VectorizedColumnReader`. Then when it is time to read batch, the `ColumnVector` with complex type will delegate to its child `ColumnVector`.
    
    * Implement constructing complex records in `VectorizedColumnReader`
    
      The `readBatch` in `VectorizedColumnReader` is the main method to read data into `ColumnVector`. Previously its behavior is simply to load required number of data according to the data type of the column vector. Now after the data is loaded into the column, we need to construct complex records in its parent column that could be an ArrayType, MapType or StructType. The way to restore the data as complex types is encoding in repetition and definition levels in Parquet. The new method `constructComplexRecords` in `VectorizedColumnReader` implements the logic to restore the complex data. Basically, what `constructComplexRecords` does is to count the continuous values and add array into the parent column if the repetition level value indicates a new record happens. Besides, `constructComplexRecords` also needs to consider the null values. Null values could mean a null record at root level, an empty array or struct. This method considers different cases and sets it correctly.
    
    ### Benchmark
    
        val N = 10000
        withParquetTable((0 until N).map { i =>
          ((i to i + 1000).toList, (i to i + 100).map(_.toString).toList,
            (i to i + 1000).map(_.toDouble / 2).toList,
            ((0 to 10).map(_.toString).toList, (0 to 10).map(_.toString).toList))
        }, "t") {
          val benchmark = new Benchmark("Vectorization Parquet for nested types", N)
          benchmark.addCase("Vectorization Parquet reader", 10) { iter =>
            sql("SELECT _1[10], _2[20], _3[30], _4._1[5], _4._2[5] FROM t").collect()
          }
          benchmark.run()
        }
    
    Disabled vectorization:
    
        Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 3.19.0-25-generic
        Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
        Vectorization Parquet for nested types:  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
        ------------------------------------------------------------------------------------------------
        Vectorization Parquet reader                  1706 / 2207          0.0      170580.8       1.0X
    
    Enabled vectorization:
    
        Java HotSpot(TM) 64-Bit Server VM 1.8.0_71-b15 on Linux 3.19.0-25-generic
        Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
        Vectorization Parquet for nested types:  Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
        ------------------------------------------------------------------------------------------------
        Vectorization Parquet reader                   789 /  972          0.0       78919.4       1.0X
    
    
    ## How was this patch tested?
    
    Jenkins tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 vectorized-parquet-complex-type

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14388.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14388
    
----
commit 8cfeb7e74843d8674c5354a67a7fc4f9d45100dd
Author: Liang-Chi Hsieh <si...@tw.ibm.com>
Date:   2016-07-27T09:32:18Z

    Add ArrayType, StructType support to vectorized Parquet reader.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by mallman <gi...@git.apache.org>.
Github user mallman commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @viirya If I do a simple `select` on an array field it works, but if I add an `order by` clause which orders by the array column I get exceptions like
    
    ```
    16/08/29 21:47:01 ERROR Executor: Exception in task 12.0 in stage 11.0 (TID 53)
    java.lang.ArrayIndexOutOfBoundsException: 4096
    	at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.putByteArray(OnHeapColumnVector.java:401)
    	at org.apache.spark.sql.execution.vectorized.ColumnVector.putByteArray(ColumnVector.java:578)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.decodeDictionaryIds(VectorizedColumnReader.java:342)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:236)
    	at org.apache.spark.sql.execution.vectorized.ColumnVector.readBatch(ColumnVector.java:1101)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.readBatchOnColumnVector(VectorizedParquetRecordReader.java:263)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.readBatchOnColumnVector(VectorizedParquetRecordReader.java:266)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:251)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:138)
    	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:97)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:134)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:97)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
    	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    	at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
    	at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:628)
    	at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
    	at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1393)
    	at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1390)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    	at org.apache.spark.scheduler.Task.run(Task.scala:86)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    16/08/29 21:47:01 ERROR Executor: Exception in task 1.0 in stage 11.0 (TID 42)
    java.lang.ArrayIndexOutOfBoundsException: 4096
    	at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.putByteArray(OnHeapColumnVector.java:401)
    	at org.apache.spark.sql.execution.vectorized.ColumnVector.putByteArray(ColumnVector.java:578)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.decodeDictionaryIds(VectorizedColumnReader.java:342)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:236)
    	at org.apache.spark.sql.execution.vectorized.ColumnVector.readBatch(ColumnVector.java:1101)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.readBatchOnColumnVector(VectorizedParquetRecordReader.java:263)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.readBatchOnColumnVector(VectorizedParquetRecordReader.java:266)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:251)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:138)
    	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:97)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
    	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    	at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
    	at org.spark_project.guava.collect.Ordering.leastOf(Ordering.java:670)
    	at org.apache.spark.util.collection.Utils$.takeOrdered(Utils.scala:37)
    	at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1393)
    	at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1$$anonfun$30.apply(RDD.scala:1390)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
    	at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785)
    	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
    	at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
    	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
    	at org.apache.spark.scheduler.Task.run(Task.scala:86)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by maver1ck <gi...@git.apache.org>.
Github user maver1ck commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @viirya 
    I will after the weekend.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    This change seems not easy to maintain. I would like to close this for now.  Maybe open later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by mallman <gi...@git.apache.org>.
Github user mallman commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @viirya Any progress on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @maver1ck Thanks for reporting this! I will take a look. Can you show me what the schema you test and what the data looks like? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    Hi @maver1ck Can you try the latest changes on your production workflow? Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    **[Test build #62957 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62957/consoleFull)** for PR 14388 at commit [`8cfeb7e`](https://github.com/apache/spark/commit/8cfeb7e74843d8674c5354a67a7fc4f9d45100dd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/62957/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63688/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @mallman Thanks! I promise not to share it with others.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @mallman Thanks. I will not share that file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63669/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    **[Test build #63677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63677/consoleFull)** for PR 14388 at commit [`d0d7230`](https://github.com/apache/spark/commit/d0d72302caf3c90ce75518a51cf9180a2d53a5ae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    **[Test build #63677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63677/consoleFull)** for PR 14388 at commit [`d0d7230`](https://github.com/apache/spark/commit/d0d72302caf3c90ce75518a51cf9180a2d53a5ae).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @maver1ck Any results about the test? Thank you.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by mallman <gi...@git.apache.org>.
Github user mallman commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @viirya I sent you an email with a link to a test file to your public github e-mail address.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14388: [SPARK-16362][SQL] Support ArrayType and StructTy...

Posted by viirya <gi...@git.apache.org>.
Github user viirya closed the pull request at:

    https://github.com/apache/spark/pull/14388


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @mallman I ran a simple test, but can't reproduce the issue. The following benchmark codes do select an array column and add an `order by` clause on it. Can you give me an example code to reproduce it? Thanks.
    
        val N = 10000
        withParquetTable((0 until N).map { i =>
          ((i to i + 1000).toList, (i to i + 100).map(_.toString).toList,
            (i to i + 1000).map(_.toDouble / 2).toList,
            ((0 to 10).map(_.toString).toList, (0 to 10).map(_.toString).toList))
        }, "t") {
          val benchmark = new Benchmark("Vectorization Parquet for nested types", N)
          benchmark.addCase("Vectorization Parquet reader", 10) { iter =>
            sql("SELECT _1 FROM t ORDER BY _1").collect()
          }
          benchmark.run()



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    **[Test build #62957 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/62957/consoleFull)** for PR 14388 at commit [`8cfeb7e`](https://github.com/apache/spark/commit/8cfeb7e74843d8674c5354a67a7fc4f9d45100dd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    ping @maver1ck 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @mallman Thanks for reporting this. It is helpful. I will investigate it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    **[Test build #63688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63688/consoleFull)** for PR 14388 at commit [`d0d7230`](https://github.com/apache/spark/commit/d0d72302caf3c90ce75518a51cf9180a2d53a5ae).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    **[Test build #63669 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63669/consoleFull)** for PR 14388 at commit [`9bae60f`](https://github.com/apache/spark/commit/9bae60f436f6d44fde1f33797edd8a4aa0c3604e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @mallman Not yet. I am working on another PR recently. I will return back when that is solved.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    **[Test build #63669 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63669/consoleFull)** for PR 14388 at commit [`9bae60f`](https://github.com/apache/spark/commit/9bae60f436f6d44fde1f33797edd8a4aa0c3604e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    **[Test build #63688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63688/consoleFull)** for PR 14388 at commit [`d0d7230`](https://github.com/apache/spark/commit/d0d72302caf3c90ce75518a51cf9180a2d53a5ae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63677/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by maver1ck <gi...@git.apache.org>.
Github user maver1ck commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @viirya 
    I tried to test your patch on my production workflow.
    Getting:
    ```
    Py4JJavaError: An error occurred while calling o56.count.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 in stage 1.0 failed 1 times, most recent failure: Lost task 20.0 in stage 1.0 (TID 21, 188.165.13.157): java.lang.ArrayIndexOutOfBoundsException: 4096
    	at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.putIntsLittleEndian(OnHeapColumnVector.java:221)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedPlainValuesReader.readIntegers(VectorizedPlainValuesReader.java:68)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedRleValuesReader.readIntegers(VectorizedRleValuesReader.java:189)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readIntBatch(VectorizedColumnReader.java:388)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:247)
    	at org.apache.spark.sql.execution.vectorized.ColumnVector.readBatch(ColumnVector.java:1094)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.readBatchOnColumnVector(VectorizedParquetRecordReader.java:263)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.readBatchOnColumnVector(VectorizedParquetRecordReader.java:266)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:251)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:138)
    	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:128)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown Source)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
    	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
    	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
    	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
    	at org.apache.spark.scheduler.Task.run(Task.scala:86)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    	at java.lang.Thread.run(Thread.java:745)
    
    Driver stacktrace:
    	at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
    	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
    	at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
    	at scala.Option.foreach(Option.scala:257)
    	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
    	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
    	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
    	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
    	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
    	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1867)
    	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1880)
    	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1893)
    	at org.apache.spark.SparkContext.runJob(SparkContext.scala:1907)
    	at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:899)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    	at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
    	at org.apache.spark.rdd.RDD.collect(RDD.scala:898)
    	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:290)
    	at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2183)
    	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
    	at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2532)
    	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2182)
    	at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2189)
    	at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2217)
    	at org.apache.spark.sql.Dataset$$anonfun$count$1.apply(Dataset.scala:2216)
    	at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2545)
    	at org.apache.spark.sql.Dataset.count(Dataset.scala:2216)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:606)
    	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:237)
    	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    	at py4j.Gateway.invoke(Gateway.java:280)
    	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:128)
    	at py4j.commands.CallCommand.execute(CallCommand.java:79)
    	at py4j.GatewayConnection.run(GatewayConnection.java:211)
    	at java.lang.Thread.run(Thread.java:745)
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 4096
    	at org.apache.spark.sql.execution.vectorized.OnHeapColumnVector.putIntsLittleEndian(OnHeapColumnVector.java:221)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedPlainValuesReader.readIntegers(VectorizedPlainValuesReader.java:68)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedRleValuesReader.readIntegers(VectorizedRleValuesReader.java:189)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readIntBatch(VectorizedColumnReader.java:388)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:247)
    	at org.apache.spark.sql.execution.vectorized.ColumnVector.readBatch(ColumnVector.java:1094)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.readBatchOnColumnVector(VectorizedParquetRecordReader.java:263)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.readBatchOnColumnVector(VectorizedParquetRecordReader.java:266)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:251)
    	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:138)
    	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:36)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:128)
    	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:91)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.agg_doAggregateWithoutKey$(Unknown Source)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
    	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
    	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
    	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
    	at org.apache.spark.scheduler.Task.run(Task.scala:86)
    	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    	... 1 more
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14388: [SPARK-16362][SQL] Support ArrayType and StructType in v...

Posted by mallman <gi...@git.apache.org>.
Github user mallman commented on the issue:

    https://github.com/apache/spark/pull/14388
  
    @viirya I'll see what I can do. If nothing else, I may be able to share a private data file over S3 if you promise not to share it with anyone else.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org