You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by icexelloss <gi...@git.apache.org> on 2018/01/11 20:45:48 UTC

[GitHub] spark pull request #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to Nu...

GitHub user icexelloss opened a pull request:

    https://github.com/apache/spark/pull/20239

    [SPARK-23047][PYTHON][SQL] Change MapVector to NullableMapVector in ArrowColumnVector

    ## What changes were proposed in this pull request?
    This PR changes usage of `MapVector` in Spark codebase to use `NullableMapVector`.
    
    `MapVector` is an internal Arrow class that is not supposed to be used directly. We should use `NullableMapVector` instead.
    
    ## How was this patch tested?
    
    Existing test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/icexelloss/spark arrow-map-vector

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20239
    
----
commit 0e59098575f0e614ecac4bf22dd21da838b241de
Author: Li Jin <ic...@...>
Date:   2018-01-11T20:43:53Z

    Change MapVector to NullableMapVector in ArrowColumnVector

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    @BryanCutler Any comments on this?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    **[Test build #86043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86043/testReport)** for PR 20239 at commit [`e068966`](https://github.com/apache/spark/commit/e0689666d77f0b62656c90ed11ba244c9fee4328).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    **[Test build #85989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85989/testReport)** for PR 20239 at commit [`0e59098`](https://github.com/apache/spark/commit/0e59098575f0e614ecac4bf22dd21da838b241de).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    `MapVector` is still used in Arrow internal code but it should not be returned to user directly. https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/types/Types.java#L134
    
    @BryanCutler Do you agree?
    
    I also added a test "non nullable struct" in `ArrowColumnVectorSuite`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to Nu...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20239


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    @BryanCutler Yes there is no error currently. This should make the code cleaner though.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    **[Test build #85989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85989/testReport)** for PR 20239 at commit [`0e59098`](https://github.com/apache/spark/commit/0e59098575f0e614ecac4bf22dd21da838b241de).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    **[Test build #86048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86048/testReport)** for PR 20239 at commit [`ab2a309`](https://github.com/apache/spark/commit/ab2a309ac8e900db50a73b87769537c5290c2363).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    **[Test build #86048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86048/testReport)** for PR 20239 at commit [`ab2a309`](https://github.com/apache/spark/commit/ab2a309ac8e900db50a73b87769537c5290c2363).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    I'm not sure we can change to `NullableMapVector` and I'm just worrying whether the `MapVector` is never happened here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    Merged to master and branch-2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85989/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    @BryanCutler I think this comes up in the Arrow sync yesterday


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86043/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    @ueshin and @BryanCutler I took another look and the class `StructAccessor` defined in `ArrowColumnVector` never gets used for `getStruct`. `ArrowColumnVector.getStruct()` method just calls `ColumnVector.getStruct()` which does the right thing. `StructAccessor` is used for `isNullAt` and does the right thing.
    
    The branch here: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java#L250 does happen. As @BryanCutler mentioned, this is because `MapVector` is a parent of `NullableMapVector` and `NullableMapVector` is actually the class gets passed in.
    
    @ueshin with regard to naming, in Arrow 0.8 most "Nullable" prefix to vector classes are removed with the exception of `MapVector`, which we plan to clean up in later releases.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86048/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    Btw, I don't mean to block this pr but why does only `MapVector` have `Nullable` version, just out of curiosity.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    **[Test build #86043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86043/testReport)** for PR 20239 at commit [`e068966`](https://github.com/apache/spark/commit/e0689666d77f0b62656c90ed11ba244c9fee4328).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    Thanks for everyone for review!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    cc @BryanCutler @ueshin 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20239
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org