You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by icexelloss <gi...@git.apache.org> on 2018/01/11 20:45:48 UTC
[GitHub] spark pull request #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to Nu...
GitHub user icexelloss opened a pull request:
https://github.com/apache/spark/pull/20239
[SPARK-23047][PYTHON][SQL] Change MapVector to NullableMapVector in ArrowColumnVector
## What changes were proposed in this pull request?
This PR changes usage of `MapVector` in Spark codebase to use `NullableMapVector`.
`MapVector` is an internal Arrow class that is not supposed to be used directly. We should use `NullableMapVector` instead.
## How was this patch tested?
Existing test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/icexelloss/spark arrow-map-vector
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20239.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20239
----
commit 0e59098575f0e614ecac4bf22dd21da838b241de
Author: Li Jin <ic...@...>
Date: 2018-01-11T20:43:53Z
Change MapVector to NullableMapVector in ArrowColumnVector
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/20239
@BryanCutler Any comments on this?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20239
**[Test build #86043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86043/testReport)** for PR 20239 at commit [`e068966`](https://github.com/apache/spark/commit/e0689666d77f0b62656c90ed11ba244c9fee4328).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20239
**[Test build #85989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85989/testReport)** for PR 20239 at commit [`0e59098`](https://github.com/apache/spark/commit/0e59098575f0e614ecac4bf22dd21da838b241de).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20239
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20239
`MapVector` is still used in Arrow internal code but it should not be returned to user directly. https://github.com/apache/arrow/blob/master/java/vector/src/main/java/org/apache/arrow/vector/types/Types.java#L134
@BryanCutler Do you agree?
I also added a test "non nullable struct" in `ArrowColumnVectorSuite`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to Nu...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20239
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20239
@BryanCutler Yes there is no error currently. This should make the code cleaner though.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20239
**[Test build #85989 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85989/testReport)** for PR 20239 at commit [`0e59098`](https://github.com/apache/spark/commit/0e59098575f0e614ecac4bf22dd21da838b241de).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20239
**[Test build #86048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86048/testReport)** for PR 20239 at commit [`ab2a309`](https://github.com/apache/spark/commit/ab2a309ac8e900db50a73b87769537c5290c2363).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20239
**[Test build #86048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86048/testReport)** for PR 20239 at commit [`ab2a309`](https://github.com/apache/spark/commit/ab2a309ac8e900db50a73b87769537c5290c2363).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/20239
I'm not sure we can change to `NullableMapVector` and I'm just worrying whether the `MapVector` is never happened here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20239
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/20239
Merged to master and branch-2.3.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20239
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85989/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20239
@BryanCutler I think this comes up in the Arrow sync yesterday
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20239
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86043/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20239
@ueshin and @BryanCutler I took another look and the class `StructAccessor` defined in `ArrowColumnVector` never gets used for `getStruct`. `ArrowColumnVector.getStruct()` method just calls `ColumnVector.getStruct()` which does the right thing. `StructAccessor` is used for `isNullAt` and does the right thing.
The branch here: https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java#L250 does happen. As @BryanCutler mentioned, this is because `MapVector` is a parent of `NullableMapVector` and `NullableMapVector` is actually the class gets passed in.
@ueshin with regard to naming, in Arrow 0.8 most "Nullable" prefix to vector classes are removed with the exception of `MapVector`, which we plan to clean up in later releases.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20239
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86048/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:
https://github.com/apache/spark/pull/20239
Btw, I don't mean to block this pr but why does only `MapVector` have `Nullable` version, just out of curiosity.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20239
**[Test build #86043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86043/testReport)** for PR 20239 at commit [`e068966`](https://github.com/apache/spark/commit/e0689666d77f0b62656c90ed11ba244c9fee4328).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20239
Thanks for everyone for review!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:
https://github.com/apache/spark/pull/20239
cc @BryanCutler @ueshin
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20239: [SPARK-23047][PYTHON][SQL] Change MapVector to NullableM...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20239
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org