You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by cloud-fan <gi...@git.apache.org> on 2018/01/31 07:37:24 UTC

[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/20450

    [SPARK-23280][SQL] add map type support to ColumnVector

    ## What changes were proposed in this pull request?
    
    Fill the last missing piece of `ColumnVector`: the map type support.
    
    The idea is similar to the array type support. A map is basically 2 arrays for keys and values. We ask the implementations to provide a key array, a value array, and an offset and length to specify the range of this map in the key/value array.
    
    In `WritableColumnVector`, we put the key array in first child vector, and value array in second child vector, and offsets and lengths in the current vector, which is very similar to how array type is implemented here.
    
    ## How was this patch tested?
    
    a new test

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark map

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20450.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20450
    
----
commit 8e66fb5077c68ff1378ec2a9e4238f9d73af5fb4
Author: Wenchen Fan <we...@...>
Date:   2018-01-30T10:52:12Z

    add map type support to ColumnVector

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20450


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r164995018
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala ---
    @@ -761,6 +761,43 @@ class ColumnarBatchSuite extends SparkFunSuite {
         }
       }
     
    +  test("Int Map") {
    +    (MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode =>
    +      val column = allocate(10, new MapType(IntegerType, IntegerType, false), memMode)
    +      (0 to 1).foreach { colIndex =>
    +        val data = column.getChild(colIndex)
    +        (0 to 5).foreach {i =>
    +          data.putInt(i, i * (colIndex + 1))
    +        }
    +      }
    +
    +      // Populate it with maps [0->0], [1->2, 2->4], [], [3->6, 4->8, 5->10]
    +      column.putArray(0, 0, 1)
    +      column.putArray(1, 1, 2)
    +      column.putNull(2)
    +      column.putArray(3, 2, 0)
    --- End diff --
    
    Yes, so the offset of the next array should be `3`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r164993656
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala ---
    @@ -761,6 +761,43 @@ class ColumnarBatchSuite extends SparkFunSuite {
         }
       }
     
    +  test("Int Map") {
    +    (MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode =>
    +      val column = allocate(10, new MapType(IntegerType, IntegerType, false), memMode)
    +      (0 to 1).foreach { colIndex =>
    +        val data = column.getChild(colIndex)
    +        (0 to 5).foreach {i =>
    +          data.putInt(i, i * (colIndex + 1))
    +        }
    +      }
    +
    +      // Populate it with maps [0->0], [1->2, 2->4], [], [3->6, 4->8, 5->10]
    +      column.putArray(0, 0, 1)
    +      column.putArray(1, 1, 2)
    +      column.putNull(2)
    +      column.putArray(3, 2, 0)
    --- End diff --
    
    the key array is: 0, 1, 2, 3, 4, 5
    the value array is: 0, 2, 4, 6, 8, 10
    
    Not that, map `[1->2, 2->4]` contributes 2 keys and values.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    cc @hvanhovell @viirya @ueshin @kiszk @gatorsmile @dongjoon-hyun 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    thanks, merging to master/2.3!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    **[Test build #86866 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86866/testReport)** for PR 20450 at commit [`8e66fb5`](https://github.com/apache/spark/commit/8e66fb5077c68ff1378ec2a9e4238f9d73af5fb4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r165067849
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -215,10 +215,18 @@ public final ColumnarRow getStruct(int rowId) {
     
       /**
        * Returns the map type value for rowId.
    +   *
    +   * In Spark, map type value is basically a key data array and a value data array. A key from the
    +   * key array with a index and a value from the value array with the same index contribute to
    +   * an entry of this map type value.
    +   *
    +   * To support map type, implementations must construct an {@link ColumnarMap} and return it in
    --- End diff --
    
    construct an -> construct a.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    We should also enable `getMap()` of `ColumnarArray`/`ColumnarRow`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/428/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86883/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r164997996
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala ---
    @@ -761,6 +761,43 @@ class ColumnarBatchSuite extends SparkFunSuite {
         }
       }
     
    +  test("Int Map") {
    +    (MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode =>
    +      val column = allocate(10, new MapType(IntegerType, IntegerType, false), memMode)
    +      (0 to 1).foreach { colIndex =>
    +        val data = column.getChild(colIndex)
    +        (0 to 5).foreach {i =>
    +          data.putInt(i, i * (colIndex + 1))
    +        }
    +      }
    +
    +      // Populate it with maps [0->0], [1->2, 2->4], [], [3->6, 4->8, 5->10]
    +      column.putArray(0, 0, 1)
    +      column.putArray(1, 1, 2)
    +      column.putNull(2)
    +      column.putArray(3, 2, 0)
    --- End diff --
    
    ah i see, you are right!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    **[Test build #86868 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86868/testReport)** for PR 20450 at commit [`2603d02`](https://github.com/apache/spark/commit/2603d024fb9d70d21b96c325d720788e3c075b3a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r165230542
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java ---
    @@ -530,7 +530,7 @@ public int putByteArray(int rowId, byte[] value, int offset, int length) {
       @Override
       protected void reserveInternal(int newCapacity) {
         int oldCapacity = (nulls == 0L) ? 0 : capacity;
    -    if (isArray()) {
    +    if (isArray() || type instanceof MapType) {
    --- End diff --
    
    nit: we may also have a method `isMap()`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r164996133
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala ---
    @@ -761,6 +761,43 @@ class ColumnarBatchSuite extends SparkFunSuite {
         }
       }
     
    +  test("Int Map") {
    +    (MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode =>
    +      val column = allocate(10, new MapType(IntegerType, IntegerType, false), memMode)
    +      (0 to 1).foreach { colIndex =>
    +        val data = column.getChild(colIndex)
    +        (0 to 5).foreach {i =>
    +          data.putInt(i, i * (colIndex + 1))
    +        }
    +      }
    +
    +      // Populate it with maps [0->0], [1->2, 2->4], [], [3->6, 4->8, 5->10]
    --- End diff --
    
    `// Populate it with maps [0->0], [1->2, 2->4], null, [], [3->6, 4->8, 5->10]`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86866/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r165231132
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -215,10 +215,18 @@ public final ColumnarRow getStruct(int rowId) {
     
       /**
        * Returns the map type value for rowId.
    +   *
    +   * In Spark, map type value is basically a key data array and a value data array. A key from the
    +   * key array with a index and a value from the value array with the same index contribute to
    +   * an entry of this map type value.
    +   *
    +   * To support map type, implementations must construct an {@link ColumnarMap} and return it in
    +   * this method. {@link ColumnarMap} requires a {@link ColumnVector} that stores the data of all
    +   * the keys of all the maps in this vector, and another {@link ColumnVector} that stores the data
    --- End diff --
    
    nit: `maps` -> `map entries` ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    **[Test build #86896 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86896/testReport)** for PR 20450 at commit [`182b404`](https://github.com/apache/spark/commit/182b4041e31b55c2c07c14813a66d6904ae61c19).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r165242471
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -215,10 +215,18 @@ public final ColumnarRow getStruct(int rowId) {
     
       /**
        * Returns the map type value for rowId.
    +   *
    +   * In Spark, map type value is basically a key data array and a value data array. A key from the
    +   * key array with a index and a value from the value array with the same index contribute to
    +   * an entry of this map type value.
    +   *
    +   * To support map type, implementations must construct an {@link ColumnarMap} and return it in
    +   * this method. {@link ColumnarMap} requires a {@link ColumnVector} that stores the data of all
    +   * the keys of all the maps in this vector, and another {@link ColumnVector} that stores the data
    --- End diff --
    
    `keys of map entries` sounds weird...


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r165098764
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -215,10 +215,18 @@ public final ColumnarRow getStruct(int rowId) {
     
       /**
        * Returns the map type value for rowId.
    +   *
    +   * In Spark, map type value is basically a key data array and a value data array. A key from the
    +   * key array with a index and a value from the value array with the same index contribute to
    +   * an entry of this map type value.
    +   *
    +   * To support map type, implementations must construct an {@link ColumnarMap} and return it in
    +   * this method. {@link ColumnarMap} requires a {@link ColumnVector} that stores the data of all
    +   * the keys of all the maps in this vector, and another {@link ColumnVector} that stores the data
    +   * of all the values of all the maps in this vector, and an offset and length which specifies the
    --- End diff --
    
    nit: ` an offset and length` -> `a pair of offset and length`? Or `specifies` -> `specify`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r165242689
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -215,10 +215,18 @@ public final ColumnarRow getStruct(int rowId) {
     
       /**
        * Returns the map type value for rowId.
    +   *
    +   * In Spark, map type value is basically a key data array and a value data array. A key from the
    +   * key array with a index and a value from the value array with the same index contribute to
    +   * an entry of this map type value.
    +   *
    +   * To support map type, implementations must construct an {@link ColumnarMap} and return it in
    --- End diff --
    
    This is very minor, may not worth to wait for another QA round. Maybe we can fix it in your "return null" PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/449/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r165242508
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarMap.java ---
    @@ -0,0 +1,53 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.vectorized;
    +
    +import org.apache.spark.sql.catalyst.util.MapData;
    +
    +/**
    + * Map abstraction in {@link ColumnVector}.
    + */
    +public final class ColumnarMap extends MapData {
    +  private final ColumnarArray keys;
    +  private final ColumnarArray values;
    +  private final int length;
    +
    +  public ColumnarMap(ColumnVector keys, ColumnVector values, int offset, int length) {
    +    this.length = length;
    +    this.keys = new ColumnarArray(keys, offset, length);
    +    this.values = new ColumnarArray(values, offset, length);
    +  }
    +
    +  @Override
    +  public int numElements() { return length; }
    --- End diff --
    
    This is a API from parent, we can change it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86896/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    **[Test build #86883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86883/testReport)** for PR 20450 at commit [`182b404`](https://github.com/apache/spark/commit/182b4041e31b55c2c07c14813a66d6904ae61c19).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    @ueshin Ok. No problem. :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    @viirya Thanks! but I'm working on it. I'll do it soon.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    **[Test build #86883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86883/testReport)** for PR 20450 at commit [`182b404`](https://github.com/apache/spark/commit/182b4041e31b55c2c07c14813a66d6904ae61c19).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    **[Test build #86872 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86872/testReport)** for PR 20450 at commit [`48c275f`](https://github.com/apache/spark/commit/48c275f6562653b259ba007906d94563f5ba010d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r164986589
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnarBatchSuite.scala ---
    @@ -761,6 +761,43 @@ class ColumnarBatchSuite extends SparkFunSuite {
         }
       }
     
    +  test("Int Map") {
    +    (MemoryMode.ON_HEAP :: MemoryMode.OFF_HEAP :: Nil).foreach { memMode =>
    +      val column = allocate(10, new MapType(IntegerType, IntegerType, false), memMode)
    +      (0 to 1).foreach { colIndex =>
    +        val data = column.getChild(colIndex)
    +        (0 to 5).foreach {i =>
    +          data.putInt(i, i * (colIndex + 1))
    +        }
    +      }
    +
    +      // Populate it with maps [0->0], [1->2, 2->4], [], [3->6, 4->8, 5->10]
    +      column.putArray(0, 0, 1)
    +      column.putArray(1, 1, 2)
    +      column.putNull(2)
    +      column.putArray(3, 2, 0)
    --- End diff --
    
    `column.putArray(3, 3, 0)`?
    Seems like line 670 is the same mistake?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86872/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    **[Test build #86866 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86866/testReport)** for PR 20450 at commit [`8e66fb5`](https://github.com/apache/spark/commit/8e66fb5077c68ff1378ec2a9e4238f9d73af5fb4).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `public final class ColumnarMap extends MapData `


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r165231647
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarMap.java ---
    @@ -0,0 +1,53 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.vectorized;
    +
    +import org.apache.spark.sql.catalyst.util.MapData;
    +
    +/**
    + * Map abstraction in {@link ColumnVector}.
    + */
    +public final class ColumnarMap extends MapData {
    +  private final ColumnarArray keys;
    +  private final ColumnarArray values;
    +  private final int length;
    +
    +  public ColumnarMap(ColumnVector keys, ColumnVector values, int offset, int length) {
    +    this.length = length;
    +    this.keys = new ColumnarArray(keys, offset, length);
    +    this.values = new ColumnarArray(values, offset, length);
    +  }
    +
    +  @Override
    +  public int numElements() { return length; }
    --- End diff --
    
    `numElements` or `length`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/437/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    **[Test build #86868 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86868/testReport)** for PR 20450 at commit [`2603d02`](https://github.com/apache/spark/commit/2603d024fb9d70d21b96c325d720788e3c075b3a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/421/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    **[Test build #86872 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86872/testReport)** for PR 20450 at commit [`48c275f`](https://github.com/apache/spark/commit/48c275f6562653b259ba007906d94563f5ba010d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/424/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    I found that we don't enable `getMap` API in `MutableColumnarRow` in this change, should we do it? If so, I can make a small follow-up PR for it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    LGTM only some nits and naming issues.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86868/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    **[Test build #86896 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86896/testReport)** for PR 20450 at commit [`182b404`](https://github.com/apache/spark/commit/182b4041e31b55c2c07c14813a66d6904ae61c19).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r165245334
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnVector.java ---
    @@ -215,10 +215,18 @@ public final ColumnarRow getStruct(int rowId) {
     
       /**
        * Returns the map type value for rowId.
    +   *
    +   * In Spark, map type value is basically a key data array and a value data array. A key from the
    +   * key array with a index and a value from the value array with the same index contribute to
    +   * an entry of this map type value.
    +   *
    +   * To support map type, implementations must construct an {@link ColumnarMap} and return it in
    --- End diff --
    
    Sure. LGTM.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20450: [SPARK-23280][SQL] add map type support to Column...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20450#discussion_r165242401
  
    --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java ---
    @@ -530,7 +530,7 @@ public int putByteArray(int rowId, byte[] value, int offset, int length) {
       @Override
       protected void reserveInternal(int newCapacity) {
         int oldCapacity = (nulls == 0L) ? 0 : capacity;
    -    if (isArray()) {
    +    if (isArray() || type instanceof MapType) {
    --- End diff --
    
    This might be an overkill, `isArray` needs to take care of many types, but `isMap` we only accept one type: MapType.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20450: [SPARK-23280][SQL] add map type support to ColumnVector

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20450
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org