You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2018/01/05 15:44:33 UTC
[GitHub] spark pull request #20166: [SPARK-22973][SQL] Fix incorrect results of Casti...
GitHub user maropu opened a pull request:
https://github.com/apache/spark/pull/20166
[SPARK-22973][SQL] Fix incorrect results of Casting Map to String
## What changes were proposed in this pull request?
This pr fixed the issue when casting maps into strings;
```
scala> Seq(Map(1 -> "a", 2 -> "b")).toDF("a").write.saveAsTable("t")
scala> sql("SELECT cast(a as String) FROM t").show(false)
+----------------------------------------------------------------+
|a |
+----------------------------------------------------------------+
|org.apache.spark.sql.catalyst.expressions.UnsafeMapData@38bdd75d|
+----------------------------------------------------------------+
```
This pr modified the result into;
```
+----------------+
|a |
+----------------+
|[1 -> a, 2 -> b]|
+----------------+
```
## How was this patch tested?
Added tests in `CastSuite`.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/maropu/spark SPARK-22973
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/20166.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #20166
----
commit efc28e3aa2a0add8e325d60162090c9ee3fdfcba
Author: Takeshi Yamamuro <ya...@...>
Date: 2018-01-05T14:33:14Z
Cast maps to strings
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/20166
ok, I'll fix struct in a next following pr first.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/20166
BTW, the current `Datset.showString` prints rows thru `RowEncoder` deserializers like;
```
scala> Seq(Seq(Seq(1, 2), Seq(3), Seq(4, 5, 6))).toDF("a").show(false)
+------------------------------------------------------------+
|a |
+------------------------------------------------------------+
|[WrappedArray(1, 2), WrappedArray(3), WrappedArray(4, 5, 6)]|
+------------------------------------------------------------+
```
If [we cast them before prints](https://github.com/apache/spark/compare/master...maropu:CastToStringInShowString), we could get more simpler forms like;
```
scala> Seq(Seq(Seq(1, 2), Seq(3), Seq(4, 5, 6))).toDF("a").show(false)
+------------------------+
|a |
+------------------------+
|[[1, 2], [3], [4, 5, 6]]|
+------------------------+
```
I'm not sure through, is this acceptable? (Probably, we might need to add a option to keep the old behaviour)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20166: [SPARK-22973][SQL] Fix incorrect results of Casti...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/20166#discussion_r160016007
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
@@ -228,6 +228,35 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
builder.append("]")
builder.build()
})
+ case MapType(kt, vt, _) =>
+ buildCast[MapData](_, map => {
+ val builder = new UTF8StringBuilder
+ builder.append("[")
+ if (map.numElements > 0) {
+ val keyToUTF8String = castToString(kt)
+ val valueToUTF8String = castToString(vt)
+ builder.append(keyToUTF8String(map.keyArray().get(0, kt)).asInstanceOf[UTF8String])
--- End diff --
`map.keyArray()` and `map.valueArray` appear many times, we can create 2 local variables for them.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20166
**[Test build #85727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85727/testReport)** for PR 20166 at commit [`be04e64`](https://github.com/apache/spark/commit/be04e64733d6051864d6597420d2c982c72606e6).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20166
LGTM except one minor comment
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20166
**[Test build #85726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85726/testReport)** for PR 20166 at commit [`efc28e3`](https://github.com/apache/spark/commit/efc28e3aa2a0add8e325d60162090c9ee3fdfcba).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20166
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85727/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:
https://github.com/apache/spark/pull/20166
@cloud-fan
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20166
**[Test build #85751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85751/testReport)** for PR 20166 at commit [`fb11796`](https://github.com/apache/spark/commit/fb1179698c5a4cdfd13bbc4fa2a0ceda07fe43c9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20166
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20166
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85751/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20166: [SPARK-22973][SQL] Fix incorrect results of Casti...
Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:
https://github.com/apache/spark/pull/20166#discussion_r160022276
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
@@ -228,6 +228,35 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
builder.append("]")
builder.build()
})
+ case MapType(kt, vt, _) =>
+ buildCast[MapData](_, map => {
+ val builder = new UTF8StringBuilder
+ builder.append("[")
+ if (map.numElements > 0) {
+ val keyToUTF8String = castToString(kt)
+ val valueToUTF8String = castToString(vt)
+ builder.append(keyToUTF8String(map.keyArray().get(0, kt)).asInstanceOf[UTF8String])
--- End diff --
ok
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20166
**[Test build #85727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85727/testReport)** for PR 20166 at commit [`be04e64`](https://github.com/apache/spark/commit/be04e64733d6051864d6597420d2c982c72606e6).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #20166: [SPARK-22973][SQL] Fix incorrect results of Casti...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/20166
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20166
**[Test build #85726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85726/testReport)** for PR 20166 at commit [`efc28e3`](https://github.com/apache/spark/commit/efc28e3aa2a0add8e325d60162090c9ee3fdfcba).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20166
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20166
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85726/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/20166
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/20166
**[Test build #85751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85751/testReport)** for PR 20166 at commit [`fb11796`](https://github.com/apache/spark/commit/fb1179698c5a4cdfd13bbc4fa2a0ceda07fe43c9).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20166
merging to master/2.3!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/20166
I think this is a bug, in `Dataset.showString` I see code like
```
case seq: Seq[_] => seq.mkString("[", ", ", "]")
```
Which means we do want to show strings like `[[1, 2], [3], [4, 5, 6]]`
Anyway let's fix in another PR, I'm merging this PR first
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org