You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by kiszk <gi...@git.apache.org> on 2017/07/21 14:24:04 UTC

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector...

GitHub user kiszk opened a pull request:

    https://github.com/apache/spark/pull/18704

    [SPARK-20783][SQL] Create CachedBatchColumnVector to abstract existing compressed column (batch method)

    ## What changes were proposed in this pull request?
    
    This PR adds a new class `OnHeapCachedBatch` class, which can have compressed data by using `CompressibleColumnAccessor`, derived from `ColumnVector` class.
    
    As first step of this implementation, this JIRA supports primitive data types. Another PR will support array and other data types.
    
    Current implementation adds compressed data by using `putByteArray()` method, and then gets data by using a getter (e.g. `getInt()`).
    
    Thisimplementation decompress data in batch into uncompressed column batch, as @rxin suggested at [here](https://github.com/apache/spark/pull/18468#issuecomment-316914076). Another implementation uses adapter approach [as @cloud-fan suggested](https://github.com/apache/spark/pull/18468).
    
    ## How was this patch tested?
    
    Added test suites


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kiszk/spark SPARK-20783a

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18704.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18704
    
----
commit c09f05f75f85203be9032ffcdcedf7f39c668a7a
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-07-21T14:17:05Z

    initial commit for batch implementation as @rxin suggested

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #80958 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80958/testReport)** for PR 18704 at commit [`a24a971`](https://github.com/apache/spark/commit/a24a971ed61f054766e3ed8212c2035f1d391d54).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #82124 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82124/testReport)** for PR 18704 at commit [`b8d5dec`](https://github.com/apache/spark/commit/b8d5decfa32a8d8c1eba331a976eb2e341c40b53).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    ping @cloud-fan & @michal-databricks 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r138409265
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java ---
    @@ -147,6 +147,11 @@ private void throwUnsupportedException(int requiredCapacity, Throwable cause) {
       public abstract void putShorts(int rowId, int count, short[] src, int srcIndex);
     
       /**
    +   * Sets values from [rowId, rowId + count) to [src[srcIndex], src[srcIndex + count])
    --- End diff --
    
    I see.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    @michal-databricks do you have any thoughts?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    @rxin Could you please review this since this is the batch approach that you suggested in [here](https://github.com/apache/spark/pull/18468#issuecomment-316914076)?
    Regarding the test failure, this is the issue only in test suite.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r139661951
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala ---
    @@ -1311,4 +1314,172 @@ class ColumnarBatchSuite extends SparkFunSuite {
         batch.close()
         allocator.close()
       }
    +
    +  test("CachedBatch boolean Apis") {
    --- End diff --
    
    I see. Moved them into `ColumnVectorSuite`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79839/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    @rxin Could you please review this PR? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #79839 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79839/testReport)** for PR 18704 at commit [`d6e8fef`](https://github.com/apache/spark/commit/d6e8fefd54352f103e323aabe0609e773d016200).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r135066268
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -433,6 +433,11 @@ private void throwUnsupportedException(int requiredCapacity, Throwable cause) {
       public abstract void putShorts(int rowId, int count, short[] src, int srcIndex);
     
       /**
    +   * Sets values from [rowId, rowId + count) to [src[srcIndex], src[srcIndex + count])
    +   */
    +  public abstract void putShorts(int rowId, int count, byte[] src, int srcIndex);
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79838/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #82426 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82426/testReport)** for PR 18704 at commit [`c16230d`](https://github.com/apache/spark/commit/c16230d34472e0337b87ce858289fec9a1d88ab4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r142428792
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/compression/PassThroughEncodingSuite.scala ---
    @@ -0,0 +1,189 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.columnar.compression
    +
    +import org.apache.spark.SparkFunSuite
    +import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
    +import org.apache.spark.sql.execution.columnar._
    +import org.apache.spark.sql.execution.columnar.ColumnarTestUtils._
    +import org.apache.spark.sql.execution.vectorized.OnHeapColumnVector
    +import org.apache.spark.sql.types.AtomicType
    +
    +class PassThroughSuite extends SparkFunSuite {
    +  val nullValue = -1
    +  testPassThrough(new ByteColumnStats, BYTE)
    +  testPassThrough(new ShortColumnStats, SHORT)
    +  testPassThrough(new IntColumnStats, INT)
    +  testPassThrough(new LongColumnStats, LONG)
    +  testPassThrough(new FloatColumnStats, FLOAT)
    +  testPassThrough(new DoubleColumnStats, DOUBLE)
    +
    +  def testPassThrough[T <: AtomicType](
    +                                        columnStats: ColumnStats,
    +                                        columnType: NativeColumnType[T]) {
    --- End diff --
    
    nit: indention is wrong here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    @cloud-fan I updated this implementation by using `ColumnVector`, as we discussed. I would appreciate it if you could discuss two implementations ([on-demand approach](https://github.com/apache/spark/pull/18468)) with @rxin.
    cc @ueshin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    @cloud-fan could you please review this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r138363787
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java ---
    @@ -147,6 +147,11 @@ private void throwUnsupportedException(int requiredCapacity, Throwable cause) {
       public abstract void putShorts(int rowId, int count, short[] src, int srcIndex);
     
       /**
    +   * Sets values from [rowId, rowId + count) to [src[srcIndex], src[srcIndex + count])
    --- End diff --
    
    This description is a little vague, as the input data is `byte[]`. Can we say more about this? e.g. endianness.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r135056247
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala ---
    @@ -278,6 +555,46 @@ private[columnar] case object DictionaryEncoding extends CompressionScheme {
         }
     
         override def hasNext: Boolean = buffer.hasRemaining
    +
    +    override def decompress(columnVector: ColumnVector, capacity: Int): Unit = {
    +      val nullsBuffer = buffer.duplicate().order(ByteOrder.nativeOrder())
    +      nullsBuffer.rewind()
    +      val nullCount = ByteBufferHelper.getInt(nullsBuffer)
    +      var nextNullIndex = if (nullCount > 0) ByteBufferHelper.getInt(nullsBuffer) else -1
    +      var pos = 0
    +      var seenNulls = 0
    +      columnType.dataType match {
    +        case _: IntegerType =>
    +          while (pos < capacity) {
    +            if (pos != nextNullIndex) {
    +              val value = dictionary(buffer.getShort()).asInstanceOf[Int]
    +              columnVector.putInt(pos, value)
    --- End diff --
    
    Sure, I will do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #82125 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82125/testReport)** for PR 18704 at commit [`549b10f`](https://github.com/apache/spark/commit/549b10fac2e3b7a8cfd9d289ab4c152e7f764a17).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #81883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81883/testReport)** for PR 18704 at commit [`bdecaaf`](https://github.com/apache/spark/commit/bdecaaf045bb135239d09919af92f718e5d0b2c5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by rxin <gi...@git.apache.org>.

Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    cc @michal-databricks any thoughts on this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81295/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18704


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #82420 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82420/testReport)** for PR 18704 at commit [`549b10f`](https://github.com/apache/spark/commit/549b10fac2e3b7a8cfd9d289ab4c152e7f764a17).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    ping @cloud-fan


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r138409856
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java ---
    @@ -147,6 +147,11 @@ private void throwUnsupportedException(int requiredCapacity, Throwable cause) {
       public abstract void putShorts(int rowId, int count, short[] src, int srcIndex);
     
       /**
    +   * Sets values from [rowId, rowId + count) to [src[srcIndex], src[srcIndex + count])
    --- End diff --
    
    @ueshin Line 145 may make a mistake in comment `Sets values from [rowId, rowId + count) to [src + srcIndex, src + srcIndex + count)`
    It should be `Sets values from [src + srcIndex, src + srcIndex + count) to [rowId, rowId + count)`
    What do you think?
    
    If so, should we update them in this PR? Or, is it better to create another PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r139605958
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala ---
    @@ -1311,4 +1314,172 @@ class ColumnarBatchSuite extends SparkFunSuite {
         batch.close()
         allocator.close()
       }
    +
    +  test("CachedBatch boolean Apis") {
    --- End diff --
    
    move these to a new test suite


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    ping @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79837/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r138838838
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/columnar/ColumnDictionary.java ---
    @@ -0,0 +1,53 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.columnar;
    +
    +import org.apache.spark.sql.execution.vectorized.Dictionary;
    +
    +public final class ColumnDictionary implements Dictionary {
    +  private Object[] dictionary;
    +
    +  public ColumnDictionary(Object[] dictionary) {
    +    this.dictionary = dictionary;
    +  }
    +
    +  @Override
    +  public int decodeToInt(int id) {
    +    return (Integer)dictionary[id];
    --- End diff --
    
    Yeah, I removed boxing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r138364852
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnAccessor.scala ---
    @@ -149,4 +153,23 @@ private[columnar] object ColumnAccessor {
             throw new Exception(s"not support type: $other")
         }
       }
    +
    +  def decompress(columnAccessor: ColumnAccessor, columnVector: WritableColumnVector, numRows: Int):
    +      Unit = {
    +    if (columnAccessor.isInstanceOf[NativeColumnAccessor[_]]) {
    +      val nativeAccessor = columnAccessor.asInstanceOf[NativeColumnAccessor[_]]
    +      nativeAccessor.decompress(columnVector, numRows)
    +    } else {
    +      val dataBuffer = columnAccessor.asInstanceOf[BasicColumnAccessor[_]].getByteBuffer
    +      val nullsBuffer = dataBuffer.duplicate().order(ByteOrder.nativeOrder())
    +      nullsBuffer.rewind()
    +
    +      val numNulls = ByteBufferHelper.getInt(nullsBuffer)
    +      for (i <- 0 until numNulls) {
    +        val cordinal = ByteBufferHelper.getInt(nullsBuffer)
    --- End diff --
    
    typo? `ordinal`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r138363222
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/columnar/ColumnDictionary.java ---
    @@ -0,0 +1,53 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.execution.columnar;
    +
    +import org.apache.spark.sql.execution.vectorized.Dictionary;
    +
    +public final class ColumnDictionary implements Dictionary {
    +  private Object[] dictionary;
    +
    +  public ColumnDictionary(Object[] dictionary) {
    +    this.dictionary = dictionary;
    +  }
    +
    +  @Override
    +  public int decodeToInt(int id) {
    +    return (Integer)dictionary[id];
    --- End diff --
    
    is it possible to avoid boxing here? e.g. we can have a lot of primitive array members.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    ping @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r139361487
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java ---
    @@ -147,6 +147,11 @@ private void throwUnsupportedException(int requiredCapacity, Throwable cause) {
       public abstract void putShorts(int rowId, int count, short[] src, int srcIndex);
     
       /**
    +   * Sets values from [rowId, rowId + count) to [src[srcIndex], src[srcIndex + count])
    --- End diff --
    
    let's update them in this PR. BTW `WritableColumnVector` may be exposed to end users, so that they can build columnar batch to data source v2 columnar scan, so the document is very important.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r138366156
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala ---
    @@ -61,6 +63,162 @@ private[columnar] case object PassThrough extends CompressionScheme {
         }
     
         override def hasNext: Boolean = buffer.hasRemaining
    +
    +    override def decompress(columnVector: WritableColumnVector, capacity: Int): Unit = {
    +      val nullsBuffer = buffer.duplicate().order(ByteOrder.nativeOrder())
    +      nullsBuffer.rewind()
    +      val nullCount = ByteBufferHelper.getInt(nullsBuffer)
    +      var nextNullIndex = if (nullCount > 0) ByteBufferHelper.getInt(nullsBuffer) else capacity
    +      var pos = 0
    +      var seenNulls = 0
    +      val srcArray = buffer.array
    +      var bufferPos = buffer.position
    +      columnType.dataType match {
    +        case _: BooleanType =>
    +          val unitSize = 1
    +          while (pos < capacity) {
    +            if (pos != nextNullIndex) {
    +              val len = nextNullIndex - pos
    +              assert(len * unitSize < Int.MaxValue)
    +              for (i <- 0 until len) {
    +                val value = buffer.get(bufferPos + i) != 0
    +                columnVector.putBoolean(pos + i, value)
    +              }
    +              bufferPos += len
    +              pos += len
    +            } else {
    +              seenNulls += 1
    +              nextNullIndex = if (seenNulls < nullCount) {
    +                ByteBufferHelper.getInt(nullsBuffer)
    +              } else {
    +                capacity
    +              }
    +              columnVector.putNull(pos)
    +              pos += 1
    +            }
    +          }
    +        case _: ByteType =>
    --- End diff --
    
    hmmm, is there any way to reduce the code duplication? maybe codegen?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #82426 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82426/testReport)** for PR 18704 at commit [`c16230d`](https://github.com/apache/spark/commit/c16230d34472e0337b87ce858289fec9a1d88ab4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class PassThroughSuite extends SparkFunSuite `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82125/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #81778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81778/testReport)** for PR 18704 at commit [`6be96f8`](https://github.com/apache/spark/commit/6be96f8e6df14e60e4bc85910c96d1c3e8402c95).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80978/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #80978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80978/testReport)** for PR 18704 at commit [`9c8960b`](https://github.com/apache/spark/commit/9c8960b44fc9893aa0f7e5c83de7b302600a41be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81778/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    ping @rxin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #80958 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80958/testReport)** for PR 18704 at commit [`a24a971`](https://github.com/apache/spark/commit/a24a971ed61f054766e3ed8212c2035f1d391d54).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r138838983
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala ---
    @@ -61,6 +63,162 @@ private[columnar] case object PassThrough extends CompressionScheme {
         }
     
         override def hasNext: Boolean = buffer.hasRemaining
    +
    +    override def decompress(columnVector: WritableColumnVector, capacity: Int): Unit = {
    +      val nullsBuffer = buffer.duplicate().order(ByteOrder.nativeOrder())
    +      nullsBuffer.rewind()
    +      val nullCount = ByteBufferHelper.getInt(nullsBuffer)
    +      var nextNullIndex = if (nullCount > 0) ByteBufferHelper.getInt(nullsBuffer) else capacity
    +      var pos = 0
    +      var seenNulls = 0
    +      val srcArray = buffer.array
    +      var bufferPos = buffer.position
    +      columnType.dataType match {
    +        case _: BooleanType =>
    +          val unitSize = 1
    +          while (pos < capacity) {
    +            if (pos != nextNullIndex) {
    +              val len = nextNullIndex - pos
    +              assert(len * unitSize < Int.MaxValue)
    +              for (i <- 0 until len) {
    +                val value = buffer.get(bufferPos + i) != 0
    +                columnVector.putBoolean(pos + i, value)
    +              }
    +              bufferPos += len
    +              pos += len
    +            } else {
    +              seenNulls += 1
    +              nextNullIndex = if (seenNulls < nullCount) {
    +                ByteBufferHelper.getInt(nullsBuffer)
    +              } else {
    +                capacity
    +              }
    +              columnVector.putNull(pos)
    +              pos += 1
    +            }
    +          }
    +        case _: ByteType =>
    --- End diff --
    
    Removed code duplication by using a function object. How about this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    LGTM, I think eventually we should simplify the columnar cache module and codegen most of it to reduce code duplication.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by michal-databricks <gi...@git.apache.org>.

Github user michal-databricks commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    I don't fully understand the big picture here, but I am assuming the goal is to have more efficient access to the data stored in compressible spark columnar cache.
    Either way this looks ok to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #82122 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82122/testReport)** for PR 18704 at commit [`1607bd1`](https://github.com/apache/spark/commit/1607bd152c64bf7900e489eb2cbef086f44e0861).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #81093 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81093/testReport)** for PR 18704 at commit [`fb0d4e5`](https://github.com/apache/spark/commit/fb0d4e53a0c3055b54b4cb45a080fb68613b281c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #79843 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79843/testReport)** for PR 18704 at commit [`bd0c334`](https://github.com/apache/spark/commit/bd0c3340c06f25522cb76a95239f679cb01a04ac).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    @cloud-fan  Resolved conflict, could you please review?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82420/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #80982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80982/testReport)** for PR 18704 at commit [`8f542b0`](https://github.com/apache/spark/commit/8f542b0046ae667061c0e368e2c82c1833754bd4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79843/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #81883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81883/testReport)** for PR 18704 at commit [`bdecaaf`](https://github.com/apache/spark/commit/bdecaaf045bb135239d09919af92f718e5d0b2c5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r139364602
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala ---
    @@ -169,6 +267,125 @@ private[columnar] case object RunLengthEncoding extends CompressionScheme {
         }
     
         override def hasNext: Boolean = valueCount < run || buffer.hasRemaining
    +
    +    override def decompress(columnVector: WritableColumnVector, capacity: Int): Unit = {
    +      val nullsBuffer = buffer.duplicate().order(ByteOrder.nativeOrder())
    +      nullsBuffer.rewind()
    +      val nullCount = ByteBufferHelper.getInt(nullsBuffer)
    +      var nextNullIndex = if (nullCount > 0) ByteBufferHelper.getInt(nullsBuffer) else -1
    +      var pos = 0
    +      var seenNulls = 0
    +      var runLocal = 0
    +      var valueCountLocal = 0
    +      columnType.dataType match {
    +        case _: BooleanType =>
    --- End diff --
    
    same here, can we reduce code duplication?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r139362028
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/CompressibleColumnAccessor.scala ---
    @@ -17,8 +17,11 @@
     
     package org.apache.spark.sql.execution.columnar.compression
     
    +import java.nio.ByteBuffer
    --- End diff --
    
    unnecessary import


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #81295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81295/testReport)** for PR 18704 at commit [`097fc05`](https://github.com/apache/spark/commit/097fc0502b059222f4cbc77c4aa0019bf013b6a3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public final class ColumnDictionary implements Dictionary `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80958/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    ping @cloud-fan


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    I will rebase this next a few hours.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #79839 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79839/testReport)** for PR 18704 at commit [`d6e8fef`](https://github.com/apache/spark/commit/d6e8fefd54352f103e323aabe0609e773d016200).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #79838 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79838/testReport)** for PR 18704 at commit [`ec368d8`](https://github.com/apache/spark/commit/ec368d85ca99f8e9dffff4846f93d8feef43081d).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public final class CachedBatchColumnVector extends ReadOnlyColumnVector `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    @cloud-fan could you please review this again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82426/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #81924 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81924/testReport)** for PR 18704 at commit [`2902c5b`](https://github.com/apache/spark/commit/2902c5bb2a92d47d1b6b2b4f1a51ee808acc3789).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    @cloud-fan merged with the latest master and addressed your comment for indent


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #82420 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82420/testReport)** for PR 18704 at commit [`549b10f`](https://github.com/apache/spark/commit/549b10fac2e3b7a8cfd9d289ab4c152e7f764a17).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #80961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80961/testReport)** for PR 18704 at commit [`6367a4c`](https://github.com/apache/spark/commit/6367a4c85792e8c7b4337b1e7d9f9f2ee741975e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #81295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81295/testReport)** for PR 18704 at commit [`097fc05`](https://github.com/apache/spark/commit/097fc0502b059222f4cbc77c4aa0019bf013b6a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #79843 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79843/testReport)** for PR 18704 at commit [`bd0c334`](https://github.com/apache/spark/commit/bd0c3340c06f25522cb76a95239f679cb01a04ac).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81883/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #82124 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82124/testReport)** for PR 18704 at commit [`b8d5dec`](https://github.com/apache/spark/commit/b8d5decfa32a8d8c1eba331a976eb2e341c40b53).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    @michal-databricks Thank you for your review and comments.
    ping @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    ping @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r135056371
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -433,6 +433,11 @@ private void throwUnsupportedException(int requiredCapacity, Throwable cause) {
       public abstract void putShorts(int rowId, int count, short[] src, int srcIndex);
     
       /**
    +   * Sets values from [rowId, rowId + count) to [src[srcIndex], src[srcIndex + count])
    +   */
    +  public abstract void putShorts(int rowId, int count, byte[] src, int srcIndex);
    --- End diff --
    
    Got it. Rebased in my local version.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r135043913
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala ---
    @@ -278,6 +555,46 @@ private[columnar] case object DictionaryEncoding extends CompressionScheme {
         }
     
         override def hasNext: Boolean = buffer.hasRemaining
    +
    +    override def decompress(columnVector: ColumnVector, capacity: Int): Unit = {
    +      val nullsBuffer = buffer.duplicate().order(ByteOrder.nativeOrder())
    +      nullsBuffer.rewind()
    +      val nullCount = ByteBufferHelper.getInt(nullsBuffer)
    +      var nextNullIndex = if (nullCount > 0) ByteBufferHelper.getInt(nullsBuffer) else -1
    +      var pos = 0
    +      var seenNulls = 0
    +      columnType.dataType match {
    +        case _: IntegerType =>
    +          while (pos < capacity) {
    +            if (pos != nextNullIndex) {
    +              val value = dictionary(buffer.getShort()).asInstanceOf[Int]
    +              columnVector.putInt(pos, value)
    --- End diff --
    
    can we delay the decompression and set the dictionary to `ColumnVector`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #79838 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79838/testReport)** for PR 18704 at commit [`ec368d8`](https://github.com/apache/spark/commit/ec368d85ca99f8e9dffff4846f93d8feef43081d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    LGTM, pending jenkins


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #79837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79837/testReport)** for PR 18704 at commit [`c09f05f`](https://github.com/apache/spark/commit/c09f05f75f85203be9032ffcdcedf7f39c668a7a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82122/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81924/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #82122 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82122/testReport)** for PR 18704 at commit [`1607bd1`](https://github.com/apache/spark/commit/1607bd152c64bf7900e489eb2cbef086f44e0861).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r138838744
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnAccessor.scala ---
    @@ -149,4 +153,23 @@ private[columnar] object ColumnAccessor {
             throw new Exception(s"not support type: $other")
         }
       }
    +
    +  def decompress(columnAccessor: ColumnAccessor, columnVector: WritableColumnVector, numRows: Int):
    +      Unit = {
    +    if (columnAccessor.isInstanceOf[NativeColumnAccessor[_]]) {
    +      val nativeAccessor = columnAccessor.asInstanceOf[NativeColumnAccessor[_]]
    +      nativeAccessor.decompress(columnVector, numRows)
    +    } else {
    +      val dataBuffer = columnAccessor.asInstanceOf[BasicColumnAccessor[_]].getByteBuffer
    +      val nullsBuffer = dataBuffer.duplicate().order(ByteOrder.nativeOrder())
    +      nullsBuffer.rewind()
    +
    +      val numNulls = ByteBufferHelper.getInt(nullsBuffer)
    +      for (i <- 0 until numNulls) {
    +        val cordinal = ByteBufferHelper.getInt(nullsBuffer)
    --- End diff --
    
    good catch, done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #80964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80964/testReport)** for PR 18704 at commit [`9c8960b`](https://github.com/apache/spark/commit/9c8960b44fc9893aa0f7e5c83de7b302600a41be).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80982/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #81093 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81093/testReport)** for PR 18704 at commit [`fb0d4e5`](https://github.com/apache/spark/commit/fb0d4e53a0c3055b54b4cb45a080fb68613b281c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public final class ColumnDictionary implements Dictionary `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80961/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #80982 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80982/testReport)** for PR 18704 at commit [`8f542b0`](https://github.com/apache/spark/commit/8f542b0046ae667061c0e368e2c82c1833754bd4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #80961 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80961/testReport)** for PR 18704 at commit [`6367a4c`](https://github.com/apache/spark/commit/6367a4c85792e8c7b4337b1e7d9f9f2ee741975e).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r138838698
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnAccessor.scala ---
    @@ -149,4 +153,23 @@ private[columnar] object ColumnAccessor {
             throw new Exception(s"not support type: $other")
         }
       }
    +
    +  def decompress(columnAccessor: ColumnAccessor, columnVector: WritableColumnVector, numRows: Int):
    +      Unit = {
    +    if (columnAccessor.isInstanceOf[NativeColumnAccessor[_]]) {
    +      val nativeAccessor = columnAccessor.asInstanceOf[NativeColumnAccessor[_]]
    +      nativeAccessor.decompress(columnVector, numRows)
    +    } else {
    +      val dataBuffer = columnAccessor.asInstanceOf[BasicColumnAccessor[_]].getByteBuffer
    +      val nullsBuffer = dataBuffer.duplicate().order(ByteOrder.nativeOrder())
    +      nullsBuffer.rewind()
    +
    +      val numNulls = ByteBufferHelper.getInt(nullsBuffer)
    +      for (i <- 0 until numNulls) {
    +        val cordinal = ByteBufferHelper.getInt(nullsBuffer)
    +        columnVector.putNull(cordinal)
    +      }
    +      throw new RuntimeException("Not support non-primitive type now")
    --- End diff --
    
    thanks, fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r135066290
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/compression/compressionSchemes.scala ---
    @@ -278,6 +555,46 @@ private[columnar] case object DictionaryEncoding extends CompressionScheme {
         }
     
         override def hasNext: Boolean = buffer.hasRemaining
    +
    +    override def decompress(columnVector: ColumnVector, capacity: Int): Unit = {
    +      val nullsBuffer = buffer.duplicate().order(ByteOrder.nativeOrder())
    +      nullsBuffer.rewind()
    +      val nullCount = ByteBufferHelper.getInt(nullsBuffer)
    +      var nextNullIndex = if (nullCount > 0) ByteBufferHelper.getInt(nullsBuffer) else -1
    +      var pos = 0
    +      var seenNulls = 0
    +      columnType.dataType match {
    +        case _: IntegerType =>
    +          while (pos < capacity) {
    +            if (pos != nextNullIndex) {
    +              val value = dictionary(buffer.getShort()).asInstanceOf[Int]
    +              columnVector.putInt(pos, value)
    --- End diff --
    
    done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #80978 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80978/testReport)** for PR 18704 at commit [`9c8960b`](https://github.com/apache/spark/commit/9c8960b44fc9893aa0f7e5c83de7b302600a41be).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r138365192
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/ColumnAccessor.scala ---
    @@ -149,4 +153,23 @@ private[columnar] object ColumnAccessor {
             throw new Exception(s"not support type: $other")
         }
       }
    +
    +  def decompress(columnAccessor: ColumnAccessor, columnVector: WritableColumnVector, numRows: Int):
    +      Unit = {
    +    if (columnAccessor.isInstanceOf[NativeColumnAccessor[_]]) {
    +      val nativeAccessor = columnAccessor.asInstanceOf[NativeColumnAccessor[_]]
    +      nativeAccessor.decompress(columnVector, numRows)
    +    } else {
    +      val dataBuffer = columnAccessor.asInstanceOf[BasicColumnAccessor[_]].getByteBuffer
    +      val nullsBuffer = dataBuffer.duplicate().order(ByteOrder.nativeOrder())
    +      nullsBuffer.rewind()
    +
    +      val numNulls = ByteBufferHelper.getInt(nullsBuffer)
    +      for (i <- 0 until numNulls) {
    +        val cordinal = ByteBufferHelper.getInt(nullsBuffer)
    +        columnVector.putNull(cordinal)
    +      }
    +      throw new RuntimeException("Not support non-primitive type now")
    --- End diff --
    
    If we need to throw exception at last, why not do it at the beginning?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #79837 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79837/testReport)** for PR 18704 at commit [`c09f05f`](https://github.com/apache/spark/commit/c09f05f75f85203be9032ffcdcedf7f39c668a7a).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82124/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    @cloud-fan as you proposed before, we will first work for reading table cache that are frequently executed.
    Then, we will work for optimizing columnar table cache building in other PRs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #81778 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81778/testReport)** for PR 18704 at commit [`6be96f8`](https://github.com/apache/spark/commit/6be96f8e6df14e60e4bc85910c96d1c3e8402c95).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80964/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by kiszk <gi...@git.apache.org>.

Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    @cloud-fan Could you please review this again?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #81924 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81924/testReport)** for PR 18704 at commit [`2902c5b`](https://github.com/apache/spark/commit/2902c5bb2a92d47d1b6b2b4f1a51ee808acc3789).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #82125 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82125/testReport)** for PR 18704 at commit [`549b10f`](https://github.com/apache/spark/commit/549b10fac2e3b7a8cfd9d289ab4c152e7f764a17).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create ColumnVector to abstract exist...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81093/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r142428980
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/compression/RunLengthEncodingSuite.scala ---
    @@ -21,19 +21,22 @@ import org.apache.spark.SparkFunSuite
     import org.apache.spark.sql.catalyst.expressions.GenericInternalRow
     import org.apache.spark.sql.execution.columnar._
     import org.apache.spark.sql.execution.columnar.ColumnarTestUtils._
    +import org.apache.spark.sql.execution.vectorized.OnHeapColumnVector
     import org.apache.spark.sql.types.AtomicType
     
     class RunLengthEncodingSuite extends SparkFunSuite {
    +  val nullValue = -1
       testRunLengthEncoding(new NoopColumnStats, BOOLEAN)
       testRunLengthEncoding(new ByteColumnStats, BYTE)
       testRunLengthEncoding(new ShortColumnStats, SHORT)
       testRunLengthEncoding(new IntColumnStats, INT)
       testRunLengthEncoding(new LongColumnStats, LONG)
    -  testRunLengthEncoding(new StringColumnStats, STRING)
    +  testRunLengthEncoding(new StringColumnStats, STRING, false)
     
       def testRunLengthEncoding[T <: AtomicType](
    -      columnStats: ColumnStats,
    -      columnType: NativeColumnType[T]) {
    +                                              columnStats: ColumnStats,
    +                                              columnType: NativeColumnType[T],
    +                                              testDecompress: Boolean = true) {
    --- End diff --
    
    ditto


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18704: [SPARK-20783][SQL] Create ColumnVector to abstrac...

Posted by cloud-fan <gi...@git.apache.org>.

Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18704#discussion_r135042826
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java ---
    @@ -433,6 +433,11 @@ private void throwUnsupportedException(int requiredCapacity, Throwable cause) {
       public abstract void putShorts(int rowId, int count, short[] src, int srcIndex);
     
       /**
    +   * Sets values from [rowId, rowId + count) to [src[srcIndex], src[srcIndex + count])
    +   */
    +  public abstract void putShorts(int rowId, int count, byte[] src, int srcIndex);
    --- End diff --
    
    now we can move them in `WritableColumnVector`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18704: [SPARK-20783][SQL] Create CachedBatchColumnVector to abs...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18704
  
    **[Test build #80964 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80964/testReport)** for PR 18704 at commit [`9c8960b`](https://github.com/apache/spark/commit/9c8960b44fc9893aa0f7e5c83de7b302600a41be).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org