You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maropu <gi...@git.apache.org> on 2018/01/05 15:44:33 UTC

[GitHub] spark pull request #20166: [SPARK-22973][SQL] Fix incorrect results of Casti...

GitHub user maropu opened a pull request:

    https://github.com/apache/spark/pull/20166

    [SPARK-22973][SQL] Fix incorrect results of Casting Map to String

    ## What changes were proposed in this pull request?
    This pr fixed the issue when casting maps into strings;
    ```
    scala> Seq(Map(1 -> "a", 2 -> "b")).toDF("a").write.saveAsTable("t")
    scala> sql("SELECT cast(a as String) FROM t").show(false)
    +----------------------------------------------------------------+
    |a                                                               |
    +----------------------------------------------------------------+
    |org.apache.spark.sql.catalyst.expressions.UnsafeMapData@38bdd75d|
    +----------------------------------------------------------------+
    ```
    This pr modified the result into;
    ```
    +----------------+
    |a               |
    +----------------+
    |[1 -> a, 2 -> b]|
    +----------------+
    ```
    
    ## How was this patch tested?
    Added tests in `CastSuite`.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maropu/spark SPARK-22973

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20166.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20166
    
----
commit efc28e3aa2a0add8e325d60162090c9ee3fdfcba
Author: Takeshi Yamamuro <ya...@...>
Date:   2018-01-05T14:33:14Z

    Cast maps to strings

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    ok, I'll fix struct in a next following pr first.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    BTW, the current `Datset.showString` prints rows thru `RowEncoder` deserializers like;
    ```
    scala> Seq(Seq(Seq(1, 2), Seq(3), Seq(4, 5, 6))).toDF("a").show(false)
    +------------------------------------------------------------+
    |a                                                           |
    +------------------------------------------------------------+
    |[WrappedArray(1, 2), WrappedArray(3), WrappedArray(4, 5, 6)]|
    +------------------------------------------------------------+
    ```
    If [we  cast them before prints](https://github.com/apache/spark/compare/master...maropu:CastToStringInShowString), we could get more simpler forms like;
    ```
    scala> Seq(Seq(Seq(1, 2), Seq(3), Seq(4, 5, 6))).toDF("a").show(false)
    +------------------------+
    |a                       |
    +------------------------+
    |[[1, 2], [3], [4, 5, 6]]|
    +------------------------+
    ```
    I'm not sure through, is this acceptable? (Probably, we might need to add a option to keep the old behaviour)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20166: [SPARK-22973][SQL] Fix incorrect results of Casti...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20166#discussion_r160016007
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -228,6 +228,35 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
             builder.append("]")
             builder.build()
           })
    +    case MapType(kt, vt, _) =>
    +      buildCast[MapData](_, map => {
    +        val builder = new UTF8StringBuilder
    +        builder.append("[")
    +        if (map.numElements > 0) {
    +          val keyToUTF8String = castToString(kt)
    +          val valueToUTF8String = castToString(vt)
    +          builder.append(keyToUTF8String(map.keyArray().get(0, kt)).asInstanceOf[UTF8String])
    --- End diff --
    
    `map.keyArray()` and `map.valueArray` appear many times, we can create 2 local variables for them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    **[Test build #85727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85727/testReport)** for PR 20166 at commit [`be04e64`](https://github.com/apache/spark/commit/be04e64733d6051864d6597420d2c982c72606e6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    LGTM except one minor comment


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    **[Test build #85726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85726/testReport)** for PR 20166 at commit [`efc28e3`](https://github.com/apache/spark/commit/efc28e3aa2a0add8e325d60162090c9ee3fdfcba).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85727/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    @cloud-fan 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    **[Test build #85751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85751/testReport)** for PR 20166 at commit [`fb11796`](https://github.com/apache/spark/commit/fb1179698c5a4cdfd13bbc4fa2a0ceda07fe43c9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85751/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20166: [SPARK-22973][SQL] Fix incorrect results of Casti...

Posted by maropu <gi...@git.apache.org>.
Github user maropu commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20166#discussion_r160022276
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala ---
    @@ -228,6 +228,35 @@ case class Cast(child: Expression, dataType: DataType, timeZoneId: Option[String
             builder.append("]")
             builder.build()
           })
    +    case MapType(kt, vt, _) =>
    +      buildCast[MapData](_, map => {
    +        val builder = new UTF8StringBuilder
    +        builder.append("[")
    +        if (map.numElements > 0) {
    +          val keyToUTF8String = castToString(kt)
    +          val valueToUTF8String = castToString(vt)
    +          builder.append(keyToUTF8String(map.keyArray().get(0, kt)).asInstanceOf[UTF8String])
    --- End diff --
    
    ok


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    **[Test build #85727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85727/testReport)** for PR 20166 at commit [`be04e64`](https://github.com/apache/spark/commit/be04e64733d6051864d6597420d2c982c72606e6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20166: [SPARK-22973][SQL] Fix incorrect results of Casti...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20166


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    **[Test build #85726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85726/testReport)** for PR 20166 at commit [`efc28e3`](https://github.com/apache/spark/commit/efc28e3aa2a0add8e325d60162090c9ee3fdfcba).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85726/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    **[Test build #85751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85751/testReport)** for PR 20166 at commit [`fb11796`](https://github.com/apache/spark/commit/fb1179698c5a4cdfd13bbc4fa2a0ceda07fe43c9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    merging to master/2.3!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20166: [SPARK-22973][SQL] Fix incorrect results of Casting Map ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20166
  
    I think this is a bug, in `Dataset.showString` I see code like
    ```
    case seq: Seq[_] => seq.mkString("[", ", ", "]")
    ```
    Which means we do want to show strings like `[[1, 2], [3], [4, 5, 6]]`
    
    Anyway let's fix in another PR, I'm merging this PR first


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org