You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by HyukjinKwon <gi...@git.apache.org> on 2018/09/04 10:05:57 UTC

[GitHub] spark pull request #22329: [SPARK-25328][PYTHON] Add an example for having t...

GitHub user HyukjinKwon opened a pull request:

    https://github.com/apache/spark/pull/22329

    [SPARK-25328][PYTHON] Add an example for having two columns as the grouping key in group aggregate pandas UDF

    ## What changes were proposed in this pull request?
    
    This PR proposes to add another example for multiple grouping key in group aggregate pandas UDF since this feature could make users still confused.
    
    ## How was this patch tested?
    
    Manually tested and documentation built:
    
    ![screen shot 2018-09-04 at 6 00 25 pm](https://user-images.githubusercontent.com/6477701/45025076-1d0b3e00-b06d-11e8-8708-e523e00204c4.png)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/HyukjinKwon/spark SPARK-25328

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22329.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22329
    
----
commit 36a7ccc37374a42a2c9cf67f3f1748df638eb937
Author: hyukjinkwon <gu...@...>
Date:   2018-09-04T10:00:34Z

    Add an example for having two columns as the grouping key in group aggregate pandas UDF

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    merged to master, thanks @HyukjinKwon .  I just saw branch-2.4 was cut already, I'll see if I can figure out how to merge there too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    **[Test build #95666 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95666/testReport)** for PR 22329 at commit [`36a7ccc`](https://github.com/apache/spark/commit/36a7ccc37374a42a2c9cf67f3f1748df638eb937).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22329: [SPARK-25328][PYTHON] Add an example for having t...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22329


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95690/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    **[Test build #95690 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95690/testReport)** for PR 22329 at commit [`2ad350c`](https://github.com/apache/spark/commit/2ad350c79bd2004282a43d6d189a828cad54cc60).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    merged to branch-2.4


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2885/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2851/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95666/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95734/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    cc @gatorsmile and @BryanCutler 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22329: [SPARK-25328][PYTHON] Add an example for having t...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22329#discussion_r214940744
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2804,6 +2804,20 @@ def pandas_udf(f=None, returnType=None, functionType=None):
            |  1|1.5|
            |  2|6.0|
            +---+---+
    +       >>> @pandas_udf("id long, v1 double, v2 double", PandasUDFType.GROUPED_MAP)  # doctest: +SKIP
    --- End diff --
    
    It took me a while to realize `v1` is a grouping key. It also a bit uncommon to use double value as a grouping key . How about we do sth like?
    
    `id long, additional_key long, v double`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    **[Test build #95734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95734/testReport)** for PR 22329 at commit [`1f342aa`](https://github.com/apache/spark/commit/1f342aa7158bc2440f504b7cb47b692fcdcce41d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    **[Test build #95734 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95734/testReport)** for PR 22329 at commit [`1f342aa`](https://github.com/apache/spark/commit/1f342aa7158bc2440f504b7cb47b692fcdcce41d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/2833/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    **[Test build #95666 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95666/testReport)** for PR 22329 at commit [`36a7ccc`](https://github.com/apache/spark/commit/36a7ccc37374a42a2c9cf67f3f1748df638eb937).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22329: [SPARK-25328][PYTHON] Add an example for having t...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22329#discussion_r215345817
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2804,6 +2804,22 @@ def pandas_udf(f=None, returnType=None, functionType=None):
            |  1|1.5|
            |  2|6.0|
            +---+---+
    +       >>> @pandas_udf(
    +       ...    "id long, additional_key double, v double",
    --- End diff --
    
    Sorry, I know you just changed it, but I think just naming the column "ceil(v1 / 2)" with a type `long` would be a little more clear. Although "additional_key" is ok too, if you guys want to keep that.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Thanks guys :-)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22329: [SPARK-25328][PYTHON] Add an example for having t...

Posted by icexelloss <gi...@git.apache.org>.
Github user icexelloss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22329#discussion_r215267320
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2804,6 +2804,22 @@ def pandas_udf(f=None, returnType=None, functionType=None):
            |  1|1.5|
            |  2|6.0|
            +---+---+
    +       >>> @pandas_udf(
    +       ...    "id long, additional_key double, v double",
    --- End diff --
    
    do you mind changing the type of additional_key to long? It seems like the type coercion here is not necessary. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    **[Test build #95690 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95690/testReport)** for PR 22329 at commit [`2ad350c`](https://github.com/apache/spark/commit/2ad350c79bd2004282a43d6d189a828cad54cc60).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22329: [SPARK-25328][PYTHON] Add an example for having two colu...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22329
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org