You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2017/10/25 03:09:48 UTC

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/19570

    [SPARK-22335][SQL] Clarify union behavior on Dataset of typed objects in the document

    ## What changes were proposed in this pull request?
    
    Seems that end users can be confused by the union's behavior on Dataset of typed objects. We can clarity it more in the document of `union` function.
    
    ## How was this patch tested?
    
    Only document change.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-22335

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19570
    
----
commit eab627ae9860442597d6a038a19b6f63a10f23e4
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2017-10-25T02:55:52Z

    Add notice into the document of `union` to clarity the usage for typed object.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19570#discussion_r147117297
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql](
        *
        * Also as standard in SQL, this function resolves columns by position (not by name).
        *
    +   * Notice that the column positions in the schema aren't necessarily matched with the
    +   * fields in the typed objects in a Dataset. This function resolves columns by their positions
    +   * in the schema, not the fields in the typed objects:
    +   *
    +   * {{{
    +   *   case class Test(a : String, b : String)
    --- End diff --
    
    `(a : String, b : String)` -> `(a: String, b: String)` maybe?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83162/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19570#discussion_r147544758
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql](
        *
        * Also as standard in SQL, this function resolves columns by position (not by name).
        *
    +   * Notice that the column positions in the schema aren't necessarily matched with the
    +   * fields in the typed objects in a Dataset. This function resolves columns by their positions
    +   * in the schema, not the fields in the typed objects, as this Scala example shows:
    +   *
    +   * {{{
    +   *   case class Test(a: String, b: String)
    +   *   val ds1 = Seq(("a", "b")).toDF("a", "b").as[Test] // ds1's schema: [a: String, b: String]
    +   *   val ds2 = Seq(("b", "a")).toDF("b", "a").as[Test] // ds2's schema: [b: String, a: String]
    +   *   ds1.union(ds2).show
    +   *
    +   *   // output:
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // |  b|  a|
    +   *   // +---+---+
    --- End diff --
    
    Sorry, but this is the example of `union`. Which example you mean to use here? Are you meaning to use the example of `unionByName`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    **[Test build #83083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83083/testReport)** for PR 19570 at commit [`2d0b8ae`](https://github.com/apache/spark/commit/2d0b8ae1760918091e4a62d1de2495d285bbe7fc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19570#discussion_r147544853
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql](
        *
        * Also as standard in SQL, this function resolves columns by position (not by name).
        *
    +   * Notice that the column positions in the schema aren't necessarily matched with the
    +   * fields in the typed objects in a Dataset. This function resolves columns by their positions
    +   * in the schema, not the fields in the typed objects, as this Scala example shows:
    +   *
    +   * {{{
    +   *   case class Test(a: String, b: String)
    +   *   val ds1 = Seq(("a", "b")).toDF("a", "b").as[Test] // ds1's schema: [a: String, b: String]
    +   *   val ds2 = Seq(("b", "a")).toDF("b", "a").as[Test] // ds2's schema: [b: String, a: String]
    +   *   ds1.union(ds2).show
    +   *
    +   *   // output:
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // |  b|  a|
    +   *   // +---+---+
    --- End diff --
    
    If I understand your comment correctly, you mean we don't need to add this example for explaining how `union` works on typed objects. Just a normal union example is good.
    
    I'm ok for it, although from past experiences, there are many end users getting confused by this difference.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    @HyukjinKwon Thanks for reviewing.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    **[Test build #83157 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83157/testReport)** for PR 19570 at commit [`1de4e13`](https://github.com/apache/spark/commit/1de4e13c0dec2c6754f91329e53e6d92274faf7c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83029/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19570#discussion_r147118974
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql](
        *
        * Also as standard in SQL, this function resolves columns by position (not by name).
        *
    +   * Notice that the column positions in the schema aren't necessarily matched with the
    +   * fields in the typed objects in a Dataset. This function resolves columns by their positions
    +   * in the schema, not the fields in the typed objects:
    +   *
    +   * {{{
    +   *   case class Test(a : String, b : String)
    --- End diff --
    
    Yes, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19570#discussion_r147544733
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql](
        *
        * Also as standard in SQL, this function resolves columns by position (not by name).
        *
    +   * Notice that the column positions in the schema aren't necessarily matched with the
    +   * fields in the typed objects in a Dataset. This function resolves columns by their positions
    +   * in the schema, not the fields in the typed objects, as this Scala example shows:
    +   *
    +   * {{{
    +   *   case class Test(a: String, b: String)
    +   *   val ds1 = Seq(("a", "b")).toDF("a", "b").as[Test] // ds1's schema: [a: String, b: String]
    +   *   val ds2 = Seq(("b", "a")).toDF("b", "a").as[Test] // ds2's schema: [b: String, a: String]
    +   *   ds1.union(ds2).show
    +   *
    +   *   // output:
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // |  b|  a|
    +   *   // +---+---+
    --- End diff --
    
    Please use the same example as `union`. Users can easily get the difference between `union` and `unionByName`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19570#discussion_r147464434
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql](
        *
        * Also as standard in SQL, this function resolves columns by position (not by name).
        *
    +   * Notice that the column positions in the schema aren't necessarily matched with the
    +   * fields in the typed objects in a Dataset. This function resolves columns by their positions
    +   * in the schema, not the fields in the typed objects, as this Scala example shows:
    +   *
    +   * {{{
    +   *   case class Test(a: String, b: String)
    +   *   val ds1 = Seq(("a", "b")).toDF("a", "b").as[Test] // ds1's schema: [a: String, b: String]
    +   *   val ds2 = Seq(("b", "a")).toDF("b", "a").as[Test] // ds2's schema: [b: String, a: String]
    +   *   ds1.union(ds2).show
    +   *
    +   *   // output:
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // |  b|  a|
    +   *   // +---+---+
    --- End diff --
    
    Please use the same example as `union `. Just need to add a comment to explain it is also applicable to the strongly-typed JVM objects.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    **[Test build #83029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83029/testReport)** for PR 19570 at commit [`eab627a`](https://github.com/apache/spark/commit/eab627ae9860442597d6a038a19b6f63a10f23e4).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    **[Test build #83162 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83162/testReport)** for PR 19570 at commit [`1de4e13`](https://github.com/apache/spark/commit/1de4e13c0dec2c6754f91329e53e6d92274faf7c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Thanks @HyukjinKwon @gatorsmile 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83151/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19570#discussion_r147119002
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql](
        *
        * Also as standard in SQL, this function resolves columns by position (not by name).
        *
    +   * Notice that the column positions in the schema aren't necessarily matched with the
    +   * fields in the typed objects in a Dataset. This function resolves columns by their positions
    +   * in the schema, not the fields in the typed objects:
    +   *
    +   * {{{
    --- End diff --
    
    Sure.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83157/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    **[Test build #83083 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83083/testReport)** for PR 19570 at commit [`2d0b8ae`](https://github.com/apache/spark/commit/2d0b8ae1760918091e4a62d1de2495d285bbe7fc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `   *   case class Test(a: String, b: String)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    **[Test build #83143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83143/testReport)** for PR 19570 at commit [`977c78e`](https://github.com/apache/spark/commit/977c78e50e00c1fd7aa8f54220a3f44de7bf8c5a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `   * this Scala example shows (using Scala case class for example, it is also applicable`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    **[Test build #83151 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83151/testReport)** for PR 19570 at commit [`1de4e13`](https://github.com/apache/spark/commit/1de4e13c0dec2c6754f91329e53e6d92274faf7c).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    **[Test build #83157 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83157/testReport)** for PR 19570 at commit [`1de4e13`](https://github.com/apache/spark/commit/1de4e13c0dec2c6754f91329e53e6d92274faf7c).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    **[Test build #83143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83143/testReport)** for PR 19570 at commit [`977c78e`](https://github.com/apache/spark/commit/977c78e50e00c1fd7aa8f54220a3f44de7bf8c5a).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    **[Test build #83151 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83151/testReport)** for PR 19570 at commit [`1de4e13`](https://github.com/apache/spark/commit/1de4e13c0dec2c6754f91329e53e6d92274faf7c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    **[Test build #83029 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83029/testReport)** for PR 19570 at commit [`eab627a`](https://github.com/apache/spark/commit/eab627ae9860442597d6a038a19b6f63a10f23e4).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `   *   case class Test(a : String, b : String)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19570#discussion_r147540587
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql](
        *
        * Also as standard in SQL, this function resolves columns by position (not by name).
        *
    +   * Notice that the column positions in the schema aren't necessarily matched with the
    +   * fields in the typed objects in a Dataset. This function resolves columns by their positions
    +   * in the schema, not the fields in the typed objects, as this Scala example shows:
    +   *
    +   * {{{
    +   *   case class Test(a: String, b: String)
    +   *   val ds1 = Seq(("a", "b")).toDF("a", "b").as[Test] // ds1's schema: [a: String, b: String]
    +   *   val ds2 = Seq(("b", "a")).toDF("b", "a").as[Test] // ds2's schema: [b: String, a: String]
    +   *   ds1.union(ds2).show
    +   *
    +   *   // output:
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // |  b|  a|
    +   *   // +---+---+
    --- End diff --
    
    Added a comment to say it's also applicable to the strongly-typed JVM objects.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83143/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19570


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19570#discussion_r147117969
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql](
        *
        * Also as standard in SQL, this function resolves columns by position (not by name).
        *
    +   * Notice that the column positions in the schema aren't necessarily matched with the
    +   * fields in the typed objects in a Dataset. This function resolves columns by their positions
    +   * in the schema, not the fields in the typed objects:
    +   *
    +   * {{{
    --- End diff --
    
    Could we clarify this is a Scala example?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83083/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

Posted by viirya <gi...@git.apache.org>.

Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19570#discussion_r147540274
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
    @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql](
        *
        * Also as standard in SQL, this function resolves columns by position (not by name).
        *
    +   * Notice that the column positions in the schema aren't necessarily matched with the
    +   * fields in the typed objects in a Dataset. This function resolves columns by their positions
    +   * in the schema, not the fields in the typed objects, as this Scala example shows:
    +   *
    +   * {{{
    +   *   case class Test(a: String, b: String)
    +   *   val ds1 = Seq(("a", "b")).toDF("a", "b").as[Test] // ds1's schema: [a: String, b: String]
    +   *   val ds2 = Seq(("b", "a")).toDF("b", "a").as[Test] // ds2's schema: [b: String, a: String]
    +   *   ds1.union(ds2).show
    +   *
    +   *   // output:
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // +---+---+
    +   *   // |  a|  b|
    +   *   // |  b|  a|
    +   *   // +---+---+
    --- End diff --
    
    Sorry, I don't get the meaning of same example as `union`. This is the only example of `union`, if I don't miss anything.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19570
  
    **[Test build #83162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83162/testReport)** for PR 19570 at commit [`1de4e13`](https://github.com/apache/spark/commit/1de4e13c0dec2c6754f91329e53e6d92274faf7c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org