You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/10/29 03:30:16 UTC

[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/19601

    [SPARK-22383][SQL] Generate code to directly get value of primitive type array from ColumnVector for table cache

    ## What changes were proposed in this pull request?
    
    This PR generates the Java code to directly get a value for a primitive type array in ColumnVector without using an iterator for table cache (e.g. dataframe.cache). This PR improves runtime performance by eliminating data copy from column-oriented storage to InternalRow in a SpecificColumnarIterator iterator for primitive type.
    This is a follow-up PR of #18747.
    
    Benchmark result: **21.4x**
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 4.4.0-22-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    
    Filter for int primitive array with cache: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    InternalRow codegen                           1368 / 1887         23.0          43.5       1.0X
    
    Filter for int primitive array with cache: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    ColumnVector codegen                            64 /   90        488.1           2.0       1.0X
    
    ```
    
    Benchmark program
    ```
      intArrayBenchmark(sqlContext, 1024 * 1024 * 20)
      def intArrayBenchmark(sqlContext: SQLContext, values: Int, iters: Int = 20): Unit = {
        import sqlContext.implicits._
        val benchmarkPT = new Benchmark("Filter for int primitive array with cache", values, iters)
        val df = sqlContext.sparkContext.parallelize(0 to ROWS, 1)
                           .map(i => Array.range(i, values)).toDF("a").cache
        df.count  // force to create df.cache
        val str = "ColumnVector"
        var c: Long = 0
        benchmarkPT.addCase(s"$str codegen") { iter =>
          c += df.filter(s"a[${values/2}] % 10 = 0").count
        }
        benchmarkPT.run()
        df.unpersist(true)
        System.gc()
      }
    ```
    
    ## How was this patch tested?
    
    Added test cases into `ColumnVectorSuite`, `DataFrameTungstenSuite`, and `WholeStageCodegenSuite`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-22383

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19601.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19601
    
----
commit 80b9e319211765807766e5cf70e995bdbbebf22e
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-10-29T03:28:06Z

    initial commit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    @cloud-fan could you please review this again? Now, this PR does not apply any change to `ColumnVector` and `WritableColumnVector`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83246 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83246/testReport)** for PR 19601 at commit [`b971506`](https://github.com/apache/spark/commit/b971506f8d5138a2c23e039427d547b736079c13).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83237/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    I'd like to also improve the write path. I think the current way to cache array type is not efficient, arrow-like format which put all elements(including nested array) together is better for encoding and compression.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83689/testReport)** for PR 19601 at commit [`eac3d30`](https://github.com/apache/spark/commit/eac3d305e63e131e45115f79fadfb2bb86a6d00e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    There are some parts that relies on the format of `UnsafeArrayData`. I mean that bit-by-bit copy of `UnsafeArrayData` is performed. Can we handle this copy using the new format for an unsafe array?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83208/testReport)** for PR 19601 at commit [`c78d462`](https://github.com/apache/spark/commit/c78d462448c948120c0b9570163af1c37c2cc3ef).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83210 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83210/testReport)** for PR 19601 at commit [`c78d462`](https://github.com/apache/spark/commit/c78d462448c948120c0b9570163af1c37c2cc3ef).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #84215 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84215/testReport)** for PR 19601 at commit [`9b6b890`](https://github.com/apache/spark/commit/9b6b890b0444f3a20e73691528b59ad21edb07b8).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    @cloud-fan could you please review this again? I merged with the `ColumnarArray`. As you suggested, the latest implementation does not change `ColumnVector` and `ColumnarArray`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84215/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83179 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83179/testReport)** for PR 19601 at commit [`80b9e31`](https://github.com/apache/spark/commit/80b9e319211765807766e5cf70e995bdbbebf22e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Can we use `OffHeapColumnVector` for cached data?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    There are two approaches to support a primitive array that is treated as binary. One is to add new `ColumnVector.Array` that I did. The other is to add new `WritableColumnVector` like @ueshin added `ArrowColumnVector`.  Both are preferable to me.
    
    I can add a new `ColumnVector` for primitive array (e.g. for `UnsafeColumnVector`) like Arrow did. Is it OK with you? To add the new class can avoid data conversion as Arrow did.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    After I think about the choice for a while, I conclude that it is better to add the new `WritableColumnVector` (i.e. `UnsafeColumnVector`) and to keep the current `ColumnVector.Array`.  
    I think that to add a new class will give us some flexibility and good abstraction between public class `ColumnVector` and other internal classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    @Ueshin @cloud-fan could you please review this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19601#discussion_r151329119
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/UnsafeColumnVector.java ---
    @@ -0,0 +1,517 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import java.nio.ByteBuffer;
    +
    +import org.apache.commons.lang.NotImplementedException;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.Platform;
    +
    +/**
    + * A column backed by UnsafeArrayData on byte[].
    + */
    +public final class UnsafeColumnVector extends WritableColumnVector {
    --- End diff --
    
    Since this `UnsafeColumnVector` represents array column, will we use the APIs like `getBoolean`, `getBooleans`...etc.?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Current `ColumnVector` uses primitive type array (e.g. `int[]` or `double[]`) based on data type of each column. On the other hand, cached data uses `byte[]` for all data type.  
    Do we change format (`Array[Array[Byte]]`) in [`CachedBatch`](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala#L53) for an primitive array?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83463/testReport)** for PR 19601 at commit [`4666974`](https://github.com/apache/spark/commit/46669745be2f64be5fec2daa1f1068057ef61282).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83463/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83931/testReport)** for PR 19601 at commit [`9a41914`](https://github.com/apache/spark/commit/9a41914694c8f1f56f294cc2380bd6ecf1ce73b8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    My feeling is that, we should change the cache format of array type to make it compatible with `ColumnVector`, then we don't need conversion from cached data to columnar batch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83208/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83250 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83250/testReport)** for PR 19601 at commit [`b971506`](https://github.com/apache/spark/commit/b971506f8d5138a2c23e039427d547b736079c13).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83931/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83905 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83905/testReport)** for PR 19601 at commit [`b025565`](https://github.com/apache/spark/commit/b025565174805e99a2a6f9f8a64e2b44f62da4e5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84253/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83934/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83465 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83465/testReport)** for PR 19601 at commit [`4666974`](https://github.com/apache/spark/commit/46669745be2f64be5fec2daa1f1068057ef61282).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #84080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84080/testReport)** for PR 19601 at commit [`63d9d57`](https://github.com/apache/spark/commit/63d9d576799d057646e991326c38b5fdb3a9f361).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    @cloud-fan could you please review this again since this version avoids to override `ColumnVector.getArray` as you suggested?
    cc @ueshin 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83208/testReport)** for PR 19601 at commit [`c78d462`](https://github.com/apache/spark/commit/c78d462448c948120c0b9570163af1c37c2cc3ef).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public final class UnsafeColumnVector extends WritableColumnVector `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    So for primitive types, we encode and compress them to binary. When reading cached data, they are decoded to primitive array and can be put in `OnHeadColumnVector` directly.
    
    For primitive type array, we treat it as binary. So when decoding it, we get a byte[] and need more effort to convert it to primitive type and put in `OnHeadColumnVector`.
    
    Can we change how we encode array type like Arrow did?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    We'd need to change the `UnsafeArrayData` format too, to avoid data copying when building the cache. BTW I think it's ok to release this columnar cache reader without efficient complex type support, so we don't need to rush.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83250/testReport)** for PR 19601 at commit [`b971506`](https://github.com/apache/spark/commit/b971506f8d5138a2c23e039427d547b736079c13).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #84253 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84253/testReport)** for PR 19601 at commit [`20d2ba2`](https://github.com/apache/spark/commit/20d2ba2819f9f6c5c10752df2d5f9ca450b0ad51).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83460 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83460/testReport)** for PR 19601 at commit [`2270304`](https://github.com/apache/spark/commit/2270304e417eebed0c3f4e80392316839539e9eb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk closed the pull request at:

    https://github.com/apache/spark/pull/19601


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83930 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83930/testReport)** for PR 19601 at commit [`17449b4`](https://github.com/apache/spark/commit/17449b4748c5c32539227c7f50c4b6ec236ab4ee).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19601#discussion_r151311790
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java ---
    @@ -93,28 +93,6 @@ public static void populate(WritableColumnVector col, InternalRow row, int field
         }
       }
     
    -  /**
    -   * Returns the array data as the java primitive array.
    -   * For example, an array of IntegerType will return an int[].
    -   * Throws exceptions for unhandled schemas.
    -   */
    -  public static Object toPrimitiveJavaArray(ColumnarArray array) {
    --- End diff --
    
    Why this method? Looks it is just used in test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83460 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83460/testReport)** for PR 19601 at commit [`2270304`](https://github.com/apache/spark/commit/2270304e417eebed0c3f4e80392316839539e9eb).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    both ways work, just pick the simpler one. I'm concerned about how to access the nested array, you can try both approaches and see which one can solve the problem easier.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83250/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #84080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84080/testReport)** for PR 19601 at commit [`63d9d57`](https://github.com/apache/spark/commit/63d9d576799d057646e991326c38b5fdb3a9f361).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84082/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84080/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83934/testReport)** for PR 19601 at commit [`9a41914`](https://github.com/apache/spark/commit/9a41914694c8f1f56f294cc2380bd6ecf1ce73b8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    @cloud-fan could you please review this PR?
    In my prototype, I succeeded to support a current nested array for table cache by changing only `UnsafeColumnVector.java`.
    
    For ease of review, I would like to ask to review this PR a simple case (non-nested primitive array) at first.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83246/testReport)** for PR 19601 at commit [`b971506`](https://github.com/apache/spark/commit/b971506f8d5138a2c23e039427d547b736079c13).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83465/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83689/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83223 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83223/testReport)** for PR 19601 at commit [`b971506`](https://github.com/apache/spark/commit/b971506f8d5138a2c23e039427d547b736079c13).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19601#discussion_r151325703
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/UnsafeColumnVector.java ---
    @@ -0,0 +1,517 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import java.nio.ByteBuffer;
    +
    +import org.apache.commons.lang.NotImplementedException;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.Platform;
    +
    +/**
    + * A column backed by UnsafeArrayData on byte[].
    + */
    +public final class UnsafeColumnVector extends WritableColumnVector {
    --- End diff --
    
    You are right `UnsafeColumnVector.putByteArray` is used to put the whole array in `byte[]` for `UnsafeArrayData`. I will put some comment to explain the usage of this API to make it clear.
    
    Good catch for `getBooleans`. It seems to be my fault since it has to take care of `rowId`. I will fix this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19601#discussion_r147578474
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -367,9 +551,13 @@ public Object get(int ordinal, DataType dataType) {
       /**
        * Returns the array at rowid.
        */
    -  public final ColumnVector.Array getArray(int rowId) {
    -    resultArray.length = getArrayLength(rowId);
    -    resultArray.offset = getArrayOffset(rowId);
    +  public final ArrayData getArray(int rowId) {
    --- End diff --
    
    We should not change the return type. `ColumnVector` will be public eventually, and `ArrayData` is not a public type.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83236/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83210 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83210/testReport)** for PR 19601 at commit [`c78d462`](https://github.com/apache/spark/commit/c78d462448c948120c0b9570163af1c37c2cc3ef).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public final class UnsafeColumnVector extends WritableColumnVector `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83905/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83930/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    I agree with you that we need to improve the write path. It will be addressed after improving the frequently-executed read path, as you suggested before. It will be addressed by the following PR.
    
    For improving the read path, which approach is better? To add new `ColumnVector.Array` or to add new `WritableColumnVector`? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83210/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Sure, let me close this


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    For now, this implementation has an limitation only to support non-nested array for ease of review.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    @cloud-fan could you please review this PR?
    In my prototype, I succeeded to support a current nested array for table cache by changing only UnsafeColumnVector.java.
    
    For ease of review, I would like to ask to review this PR for a simple case (non-nested primitive array) at first.
    cc: @ueshin


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk closed the pull request at:

    https://github.com/apache/spark/pull/19601


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

Posted by kiszk <gi...@git.apache.org>.
GitHub user kiszk reopened a pull request:

    https://github.com/apache/spark/pull/19601

    [SPARK-22383][SQL] Generate code to directly get value of primitive type array from ColumnVector for table cache

    ## What changes were proposed in this pull request?
    
    This PR generates the Java code to directly get a value for a primitive type array in ColumnVector without using an iterator for table cache (e.g. dataframe.cache). This PR improves runtime performance by eliminating data copy from column-oriented storage to InternalRow in a SpecificColumnarIterator iterator for primitive type. This is a follow-up PR of #18747.
    
    The idea of this implementation is to add `ColumnVector.UnsafeArray` to keep `UnsafeArrayData` for an array in addition to `ColumnVector.Array` that keeps `ColumnVector` for a Java primitive array for an array.
    
    Benchmark result: **21.4x**
    
    ```
    OpenJDK 64-Bit Server VM 1.8.0_121-8u121-b13-0ubuntu1.16.04.2-b13 on Linux 4.4.0-22-generic
    Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz
    
    Filter for int primitive array with cache: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    InternalRow codegen                           1368 / 1887         23.0          43.5       1.0X
    
    Filter for int primitive array with cache: Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    ------------------------------------------------------------------------------------------------
    ColumnVector codegen                            64 /   90        488.1           2.0       1.0X
    
    ```
    
    Benchmark program
    ```
      intArrayBenchmark(sqlContext, 1024 * 1024 * 20)
      def intArrayBenchmark(sqlContext: SQLContext, values: Int, iters: Int = 20): Unit = {
        import sqlContext.implicits._
        val benchmarkPT = new Benchmark("Filter for int primitive array with cache", values, iters)
        val df = sqlContext.sparkContext.parallelize(0 to ROWS, 1)
                           .map(i => Array.range(i, values)).toDF("a").cache
        df.count  // force to create df.cache
        val str = "ColumnVector"
        var c: Long = 0
        benchmarkPT.addCase(s"$str codegen") { iter =>
          c += df.filter(s"a[${values/2}] % 10 = 0").count
        }
        benchmarkPT.run()
        df.unpersist(true)
        System.gc()
      }
    ```
    
    ## How was this patch tested?
    
    Added test cases into `ColumnVectorSuite`, `DataFrameTungstenSuite`, and `WholeStageCodegenSuite`


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-22383

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19601.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19601
    
----
commit 12dd996b134cbd6aeb83d70dc14f50fc2516e6ea
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-10-29T03:28:06Z

    initial commit

commit 7fa67d1c5d259dabe34e35427c9f69746ac82260
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-10-30T09:48:46Z

    add UnsafeColumnVector to support array for table cache

commit 05ec886fb93cb0f05137f3b69bcd20b20455225b
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-10-30T17:43:40Z

    fix faiulres in CacheTableSuite, HiveCompatibilitySuite, and HiveQuerySuite

commit 761516a1e3234472a92493a9375e1db79998a1b0
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-10-30T17:44:05Z

    remove wrong assert to fix failures

commit 98f764fea0c8ee256395d285b7b69d62797d3e93
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-11-05T06:57:47Z

    avoid to override ColumnVector.getArray()

commit 5477d5b3bc2362d8ab861af1601989dfb1fefa79
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-11-05T15:53:24Z

    fix test failures

commit 029af07d73cba3beafe0c591ef4c14e18bfe4dd1
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-11-10T16:06:20Z

    Remove ColumnVector.putUnsafeData()

commit 94c6a97017528a068214117ce061b1ce9dd053b3
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-11-15T18:32:30Z

    rebase with master

commit 11af85cf0d13f6cff2f3c246afc1e664de8e3e41
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-11-16T07:18:17Z

    address review comment

commit 63d9d576799d057646e991326c38b5fdb3a9f361
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-11-16T07:54:19Z

    fix compilation error

commit 9b6b890b0444f3a20e73691528b59ad21edb07b8
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-11-21T18:51:00Z

    fix failures of rebase

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83240/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83934/testReport)** for PR 19601 at commit [`9a41914`](https://github.com/apache/spark/commit/9a41914694c8f1f56f294cc2380bd6ecf1ce73b8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    @cloud-fan could you please review this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19601#discussion_r151324069
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVectorUtils.java ---
    @@ -93,28 +93,6 @@ public static void populate(WritableColumnVector col, InternalRow row, int field
         }
       }
     
    -  /**
    -   * Returns the array data as the java primitive array.
    -   * For example, an array of IntegerType will return an int[].
    -   * Throws exceptions for unhandled schemas.
    -   */
    -  public static Object toPrimitiveJavaArray(ColumnarArray array) {
    --- End diff --
    
    This method was used only for test. I removed this by replacing this with another method.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    can we hold it for a while? I'm thinking about ColumnVector refactoring and see how to deal with nested data uniformly.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83179/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83460/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Does it work for non-top-level array type fields and nested array? Generally I think this is not the right direction. The root cause is that, table cache array format is not the arrow-style format(put all leaf elements together) and can't fit in `ColumnarArray`. To avoid the expensive conversion, what you did here is kind of creating a fake `ColumnarArray` to be able to read the table cache array data directly. I think this is not a general solution but just a hack.
    
    Futher more, all complex types face the same problem in table cache. I think the right direction is to revisit the columnar cache format and design how to deal with complex types. This is something we dropped at the beginning(treat complext types as binary data) and now it's time to pay back.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83931 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83931/testReport)** for PR 19601 at commit [`9a41914`](https://github.com/apache/spark/commit/9a41914694c8f1f56f294cc2380bd6ecf1ce73b8).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19601#discussion_r151315786
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/UnsafeColumnVector.java ---
    @@ -0,0 +1,517 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import java.nio.ByteBuffer;
    +
    +import org.apache.commons.lang.NotImplementedException;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.Platform;
    +
    +/**
    + * A column backed by UnsafeArrayData on byte[].
    + */
    +public final class UnsafeColumnVector extends WritableColumnVector {
    --- End diff --
    
    This abstraction looks confused at first glance. It seems not following some `ColumnVector` APIs usage. Looks like this uses `putByteArray` to set up byte array `data` which stores the data of this array column.
    
    IIUC, this is proposed to represent only array column, but some APIs implementation looks weird. For example, `getBoolean` respects `rowId` parameter and `getBooleans` doesn't.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Jenkins, retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83930 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83930/testReport)** for PR 19601 at commit [`17449b4`](https://github.com/apache/spark/commit/17449b4748c5c32539227c7f50c4b6ec236ab4ee).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83905 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83905/testReport)** for PR 19601 at commit [`b025565`](https://github.com/apache/spark/commit/b025565174805e99a2a6f9f8a64e2b44f62da4e5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    My prototype for nested array can handle nested array by changing `UnsafeArray.getArray` and its callee methods, and does not require to change `ColumnVector`.  
    If the refactoring takes more than several weeks, I can commit my prototype to support nested array to be merged into Spark 2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #84253 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84253/testReport)** for PR 19601 at commit [`20d2ba2`](https://github.com/apache/spark/commit/20d2ba2819f9f6c5c10752df2d5f9ca450b0ad51).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    I see. Let us revisit this design later.  
    
    I would appreciate it if you would review this columnar cache reader with simple primitive-type (non-nested) array.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19601#discussion_r151406905
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/UnsafeColumnVector.java ---
    @@ -0,0 +1,517 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +package org.apache.spark.sql.execution.vectorized;
    +
    +import java.nio.ByteBuffer;
    +
    +import org.apache.commons.lang.NotImplementedException;
    +
    +import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData;
    +import org.apache.spark.sql.types.*;
    +import org.apache.spark.unsafe.Platform;
    +
    +/**
    + * A column backed by UnsafeArrayData on byte[].
    + */
    +public final class UnsafeColumnVector extends WritableColumnVector {
    --- End diff --
    
    These two methods `getBoolean` and `getBooleans` are used [here](github.com/apache/spark/blob/dce1610ae376af00712ba7f4c99bfb4c006dbaec/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarArray.java) and [there](github.com/apache/spark/blob/dce1610ae376af00712ba7f4c99bfb4c006dbaec/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarArray.java) in `ColumnarArray`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83689/testReport)** for PR 19601 at commit [`eac3d30`](https://github.com/apache/spark/commit/eac3d305e63e131e45115f79fadfb2bb86a6d00e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Hi, @kiszk . Can we close this for now? You can make another PR later if you want.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83465 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83465/testReport)** for PR 19601 at commit [`4666974`](https://github.com/apache/spark/commit/46669745be2f64be5fec2daa1f1068057ef61282).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Hi, @kiszk . Is this still valid?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #83179 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83179/testReport)** for PR 19601 at commit [`80b9e31`](https://github.com/apache/spark/commit/80b9e319211765807766e5cf70e995bdbbebf22e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  public static final class UnsafeArray extends ArrayData `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #84082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84082/testReport)** for PR 19601 at commit [`9b6b890`](https://github.com/apache/spark/commit/9b6b890b0444f3a20e73691528b59ad21edb07b8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    This approach also works for nested array. I have this implementation in my machine. For ease of review, I commit the version of only primitive type array support. If you like it, I can commit the version for nested array support.  
    Yes, I created a wrap `ColumnarArray` corresponding to an `UnsafeArrayData` for an array at each nest level. It can **avoid expensive data copy**, which is very important for array and complex type. This is because data size of these data structures are large.
    
    If we can still avoid expensive data copy, which is accomplished by pointing a part of one large array (not copying data from the large array), by using the new design (you may think about [such a format](https://github.com/apache/arrow/blob/master/format/Layout.md#example-layout-listlistbyte)), I am happy to revisit the format with you. It is an issue for internal implementation. The redesign would not affect the external interfaces `ColumnVector` and `ColumnarArray`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83246/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #84215 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84215/testReport)** for PR 19601 at commit [`9b6b890`](https://github.com/apache/spark/commit/9b6b890b0444f3a20e73691528b59ad21edb07b8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    **[Test build #84082 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84082/testReport)** for PR 19601 at commit [`9b6b890`](https://github.com/apache/spark/commit/9b6b890b0444f3a20e73691528b59ad21edb07b8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19601: [SPARK-22383][SQL] Generate code to directly get ...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19601#discussion_r147578600
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -367,9 +551,13 @@ public Object get(int ordinal, DataType dataType) {
       /**
        * Returns the array at rowid.
        */
    -  public final ColumnVector.Array getArray(int rowId) {
    -    resultArray.length = getArrayLength(rowId);
    -    resultArray.offset = getArrayOffset(rowId);
    +  public final ArrayData getArray(int rowId) {
    --- End diff --
    
    I see.
    One question. `ColumnVector.Array` has some public fields such as `length`. I think that it would be good to use an accessor `length` or `getLength`. What do you think?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19601: [SPARK-22383][SQL] Generate code to directly get value o...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19601
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org