You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by map222 <gi...@git.apache.org> on 2017/05/05 02:04:48 UTC

[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

GitHub user map222 opened a pull request:

    https://github.com/apache/spark/pull/17865

    [SPARK-20456][Docs] Add examples for functions collection for pyspark

    ## What changes were proposed in this pull request?
    
    This adds documentation to many functions in pyspark.sql.functions.py:
    `upper`, `lower`, `reverse`, `unix_timestamp`, `from_unixtime`, `rand`, `randn`, `collect_list`, `collect_set`, `lit`
    Add units to the trigonometry functions.
    Renames columns in datetime examples to be more informative.
    Adds links between some functions.
    
    ## How was this patch tested?
    
    `./dev/lint-python`
    `python python/pyspark/sql/functions.py`
    `./python/run-tests.py --module pyspark-sql`

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/map222/spark spark-20456

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17865
    
----
commit 91515c620287e193c6d208038025fe194740e4d2
Author: Michael Patterson <ma...@gmail.com>
Date:   2017-05-05T00:26:56Z

    First revision: trigonometry units, lit, collect_set, collect_list, unix_timestamp, from_unixtime

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Ok, I removed the `ignore_unicode_prefix`, and it didn't have any test problems. Also removed the date column renames where the original name was ok. Finally, removed the string function documentation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79207 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79207/testReport)** for PR 17865 at commit [`9cc34c8`](https://github.com/apache/spark/commit/9cc34c895eaad5ba4366308a0324c8adb2f9510e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #76625 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76625/testReport)** for PR 17865 at commit [`ca8b5f7`](https://github.com/apache/spark/commit/ca8b5f7d666bd13a515ba1358e4f69ff13df9711).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79165 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79165/testReport)** for PR 17865 at commit [`60af595`](https://github.com/apache/spark/commit/60af5959a340068ea82d4e96c92b634959494f93).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123962727
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -200,17 +225,20 @@ def _():
     @since(1.3)
     def approxCountDistinct(col, rsd=None):
         """
    -    .. note:: Deprecated in 2.1, use approx_count_distinct instead.
    +    .. note:: Deprecated in 2.1, use :func:`approx_count_distinct` instead.
         """
         return approx_count_distinct(col, rsd)
     
     
     @since(2.1)
     def approx_count_distinct(col, rsd=None):
    -    """Returns a new :class:`Column` for approximate distinct count of ``col``.
    +    """Aggregate function. Returns a new :class:`Column` for approximate distinct count of column `col`.
    --- End diff --
    
    little nit `. R` -> `: r`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78586/testReport)** for PR 17865 at commit [`9e1e9e1`](https://github.com/apache/spark/commit/9e1e9e19bcfaf0a3fb5bf3264058b46573877b76).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    @HyukjinKwon I could not get it to pass the tests without the unicode and `ignore_unicode_prefix`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r114929689
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -456,7 +479,7 @@ def monotonically_increasing_id():
     def nanvl(col1, col2):
         """Returns col1 if it is not NaN, or col2 if col1 is NaN.
     
    -    Both inputs should be floating point columns (DoubleType or FloatType).
    +    Both inputs should be floating point columns (:class:`DoubleType` or FloatType).
    --- End diff --
    
    I think we should link both `DoubleType` and `FloatType ` all or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78031 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78031/testReport)** for PR 17865 at commit [`8c34c8b`](https://github.com/apache/spark/commit/8c34c8b9073863178ff74c18a01dcf0db32e6805).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78158 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78158/testReport)** for PR 17865 at commit [`29bcb67`](https://github.com/apache/spark/commit/29bcb67eb85bf79781d3a3bf1cb35cdcfe433f5c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79343/testReport)** for PR 17865 at commit [`7dbf35f`](https://github.com/apache/spark/commit/7dbf35f2331d2363191850b3f4e13bf58241bad2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121483327
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -67,9 +67,15 @@ def _():
         _.__doc__ = 'Window function: ' + doc
         return _
     
    +_lit_doc = """
    +    Creates a :class:`Column` of literal value.
     
    +    >>> df.withColumn('height', lit(5) ).withColumn('spark_user', lit(True) ).collect()
    --- End diff --
    
    nit: remove extra spaces after `lit(5)` and `lit(True)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76625/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123182491
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -109,15 +118,33 @@ def _():
         'rint': 'Returns the double value that is closest in value to the argument and' +
                 ' is equal to a mathematical integer.',
         'signum': 'Computes the signum of the given value.',
    -    'sin': 'Computes the sine of the given value.',
    +    'sin': """Computes the sine of the given value.
    +
    +           :param col: :class:`DoubleType` column, units in radians.""",
         'sinh': 'Computes the hyperbolic sine of the given value.',
    -    'tan': 'Computes the tangent of the given value.',
    +    'tan': """Computes the tangent of the given value.
    +
    +           :param col: :class:`DoubleType` column, units in radians.""",
         'tanh': 'Computes the hyperbolic tangent of the given value.',
    -    'toDegrees': '.. note:: Deprecated in 2.1, use degrees instead.',
    -    'toRadians': '.. note:: Deprecated in 2.1, use radians instead.',
    +    'toDegrees': '.. note:: Deprecated in 2.1, use :func:`degrees` instead.',
    +    'toRadians': '.. note:: Deprecated in 2.1, use :func:`radians` instead.',
         'bitwiseNOT': 'Computes bitwise not.',
     }
     
    +_collect_list_doc = """
    +    Aggregate function: returns a list of objects with duplicates.
    +
    +    >>> df2 = spark.createDataFrame([('Alice', 2), ('Bob', 5), ('Alice', 99)], ('name', 'age'))
    +    >>> df2.agg(collect_list('name')).collect()
    --- End diff --
    
    I think we can avoid `ignore_unicode_prefix` if we call this function with `age` for this and the one below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79162 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79162/testReport)** for PR 17865 at commit [`f5d0d0f`](https://github.com/apache/spark/commit/f5d0d0f2d1649305213e50d9e76180f449027a4a).
     * This patch **fails some tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79163/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79343/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #76625 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76625/testReport)** for PR 17865 at commit [`ca8b5f7`](https://github.com/apache/spark/commit/ca8b5f7d666bd13a515ba1358e4f69ff13df9711).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79118/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #76477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76477/testReport)** for PR 17865 at commit [`91515c6`](https://github.com/apache/spark/commit/91515c620287e193c6d208038025fe194740e4d2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79343/testReport)** for PR 17865 at commit [`7dbf35f`](https://github.com/apache/spark/commit/7dbf35f2331d2363191850b3f4e13bf58241bad2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78029 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78029/testReport)** for PR 17865 at commit [`a61173f`](https://github.com/apache/spark/commit/a61173f2172cc716c76abf2e14ca21ff77ab08b2).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78031/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r125518551
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -148,7 +148,8 @@ setMethod("asin",
     
     #' atan
     #'
    -#' Computes the tangent inverse of the given value.
    +#' Computes the tangent inverse of the given value; the returned angle is in the range
    +#' -pi/2 through pi/2
    --- End diff --
    
    this has been moved in master - you should see when you rebase


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121857687
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -189,15 +210,15 @@ def _():
     }
     
     for _name, _doc in _functions.items():
    -    globals()[_name] = since(1.3)(_create_function(_name, _doc))
    +    globals()[_name] = since(1.3)(ignore_unicode_prefix(_create_function(_name, _doc)))
    --- End diff --
    
    `ignore_unicode_prefix` is necessary to get the tests to pass both python 2 and python 3. I had problems without it on my previous documentation PR: https://github.com/apache/spark/pull/17469#issuecomment-295896057


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Yes, or you can merge master and fix the conflict.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    How about https://github.com/apache/spark/pull/17865/files#r123964695 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    @map222 Unfortunately, our PySpark did not follow what we did in Scala. Will review it more carefully in the future. Thanks! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79163 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79163/testReport)** for PR 17865 at commit [`e7935d1`](https://github.com/apache/spark/commit/e7935d15f0728af6ecd74373f6c71d2239360815).
     * This patch **fails some tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78189/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #76763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76763/testReport)** for PR 17865 at commit [`dd7a397`](https://github.com/apache/spark/commit/dd7a3971275aae97f9adfd82166df23b038eb950).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79118 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79118/testReport)** for PR 17865 at commit [`f17f332`](https://github.com/apache/spark/commit/f17f332dd97b948f8dd31eb2b18c1e11dc7fead0).
     * This patch passes all tests.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121859623
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -189,15 +210,15 @@ def _():
     }
     
     for _name, _doc in _functions.items():
    -    globals()[_name] = since(1.3)(_create_function(_name, _doc))
    +    globals()[_name] = since(1.3)(ignore_unicode_prefix(_create_function(_name, _doc)))
    --- End diff --
    
    Yea, I didn't mean to not use but avoid applying all other docs (more specifically the ones in `_functions`).  This basically replaces u prefix via regex and rather a workaround IMHO. So, I think we should avoid applying this to other doctests not using unicode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78183 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78183/testReport)** for PR 17865 at commit [`89138ea`](https://github.com/apache/spark/commit/89138ea41bf27c8dd6549ef6c38395f3ccf6d843).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Also cc @ueshin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123963627
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -282,8 +309,7 @@ def corr(col1, col2):
     
     @since(2.0)
     def covar_pop(col1, col2):
    -    """Returns a new :class:`Column` for the population covariance of ``col1``
    -    and ``col2``.
    +    """Returns a new :class:`Column` for the population covariance of `col1` and `col2`.
    --- End diff --
    
    little nit: `` `col1` `` -> ``` ``col1`` ``` and the same one below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121536038
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -92,14 +98,16 @@ def _():
     _functions_1_4 = {
         # unary math functions
         'acos': 'Computes the cosine inverse of the given value; the returned angle is in the range' +
    -            '0.0 through pi.',
    -    'asin': 'Computes the sine inverse of the given value; the returned angle is in the range' +
    -            '-pi/2 through pi/2.',
    -    'atan': 'Computes the tangent inverse of the given value.',
    +            '0.0 through pi.\n\n:param col: float column, units in radians.',
    --- End diff --
    
    Here it uses `\n` where as the below is not. Let'e make them consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121539034
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -189,15 +210,15 @@ def _():
     }
     
     for _name, _doc in _functions.items():
    -    globals()[_name] = since(1.3)(_create_function(_name, _doc))
    +    globals()[_name] = since(1.3)(ignore_unicode_prefix(_create_function(_name, _doc)))
    --- End diff --
    
    Hm, this then applies `ignore_unicode_prefix` to all docs ... This looks only added for `_lit_doc`. Could we simply avoid the unicode example in the lit doc rather than applying this to all function docs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79163 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79163/testReport)** for PR 17865 at commit [`e7935d1`](https://github.com/apache/spark/commit/e7935d15f0728af6ecd74373f6c71d2239360815).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78586/testReport)** for PR 17865 at commit [`9e1e9e1`](https://github.com/apache/spark/commit/9e1e9e19bcfaf0a3fb5bf3264058b46573877b76).
     * This patch **fails PySpark pip packaging tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78029/testReport)** for PR 17865 at commit [`a61173f`](https://github.com/apache/spark/commit/a61173f2172cc716c76abf2e14ca21ff77ab08b2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123184373
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -267,8 +296,7 @@ def coalesce(*cols):
     
     @since(1.6)
     def corr(col1, col2):
    -    """Returns a new :class:`Column` for the Pearson Correlation Coefficient for ``col1``
    -    and ``col2``.
    +    """Returns a new :class:`Column` for the Pearson Correlation Coefficient for `col1` and `col2`.
    --- End diff --
    
    Honestly, this is rather unnecessary change. Both before and after are fine and do not affect the documentation rendering.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121539617
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -206,17 +227,20 @@ def _():
     @since(1.3)
     def approxCountDistinct(col, rsd=None):
         """
    -    .. note:: Deprecated in 2.1, use approx_count_distinct instead.
    +    .. note:: Deprecated in 2.1, use :func:`approx_count_distinct` instead.
         """
         return approx_count_distinct(col, rsd)
     
     
     @since(2.1)
     def approx_count_distinct(col, rsd=None):
    -    """Returns a new :class:`Column` for approximate distinct count of ``col``.
    +    """Returns a new :class:`Column` for approximate distinct count of column `col`.
    --- End diff --
    
    Sounds like it also needs leading `Aggregate function: ` doc - https://github.com/apache/spark/blob/3a840048ed3501e06260b7c5df18cc0bbdb1505c/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L225.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Could you please check the documents we did in Scala APIs? It sounds like we forgot to update the Python function descriptions when we did the change in the Scala APIs. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79189 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79189/testReport)** for PR 17865 at commit [`9cc34c8`](https://github.com/apache/spark/commit/9cc34c895eaad5ba4366308a0324c8adb2f9510e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78183 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78183/testReport)** for PR 17865 at commit [`89138ea`](https://github.com/apache/spark/commit/89138ea41bf27c8dd6549ef6c38395f3ccf6d843).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121860425
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1254,23 +1294,41 @@ def hash(*cols):
     
     # ---------------------- String/Binary functions ------------------------------
     
    +_lower_doc = """
    +    Converts a string column to lower case.
    +
    +    >>> df.select(lower(df.name) ).collect()
    +    [Row(lower(name)=u'alice'), Row(lower(name)=u'bob')]
    +    """
    +_upper_doc = """
    +    Converts a string column to upper case.
    +
    +    >>> df.select(upper(df.name) ).collect()
    +    [Row(upper(name)=u'ALICE'), Row(upper(name)=u'BOB')]
    +    """
    +_reverse_doc = """
    +    Reverses the string column and returns it as a new string column.
    +
    +    >>> df.select(reverse(df.name) ).collect()
    +    [Row(reverse(name)=u'ecilA'), Row(reverse(name)=u'boB')]
    +    """
     _string_functions = {
         'ascii': 'Computes the numeric value of the first character of the string column.',
         'base64': 'Computes the BASE64 encoding of a binary column and returns it as a string column.',
         'unbase64': 'Decodes a BASE64 encoded string column and returns it as a binary column.',
         'initcap': 'Returns a new string column by converting the first letter of each word to ' +
                    'uppercase. Words are delimited by whitespace.',
    -    'lower': 'Converts a string column to lower case.',
    -    'upper': 'Converts a string column to upper case.',
    -    'reverse': 'Reverses the string column and returns it as a new string column.',
    +    'lower': _lower_doc,
    +    'upper': _upper_doc,
    +    'reverse': _reverse_doc,
         'ltrim': 'Trim the spaces from left end for the specified string value.',
         'rtrim': 'Trim the spaces from right end for the specified string value.',
         'trim': 'Trim the spaces from both ends for the specified string column.',
     }
     
     
     for _name, _doc in _string_functions.items():
    -    globals()[_name] = since(1.5)(_create_function(_name, _doc))
    +    globals()[_name] = since(1.5)(ignore_unicode_prefix(_create_function(_name, _doc)))
    --- End diff --
    
    I think it is helpful to be clear on how `lower` and `upper` should be used. This week I tried to use `lower` as a column function, rather than as an outside function. I can remove the docstrings if you think avoiding `ignore_unicode_prefix` is important.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79164 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79164/testReport)** for PR 17865 at commit [`2c9898b`](https://github.com/apache/spark/commit/2c9898bff7f4b65a303ac2f679cd390700b531ef).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r114929441
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -153,7 +173,7 @@ def _():
     # math functions that take two arguments as input
     _binary_mathfunctions = {
         'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' +
    -             'polar coordinates (r, theta).',
    +             'polar coordinates (r, theta). Units in radians.',
    --- End diff --
    
    I am unclear we should note this for every instance and users really gets confused as I see these use Scala/Java's built-in library. I wonder if there is an example that supports this differently in other libraries?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78188 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78188/testReport)** for PR 17865 at commit [`e1bb723`](https://github.com/apache/spark/commit/e1bb723a04900704889fa0c66a14723193898bf0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r116550583
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -153,7 +173,7 @@ def _():
     # math functions that take two arguments as input
     _binary_mathfunctions = {
         'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' +
    -             'polar coordinates (r, theta).',
    +             'polar coordinates (r, theta). Units in radians.',
    --- End diff --
    
    @HyukjinKwon Any opinion on how to do the `:param:` for the units?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76763/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123965739
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -473,10 +503,15 @@ def rand(seed=None):
         return Column(jc)
     
     
    +@ignore_unicode_prefix
     @since(1.4)
     def randn(seed=None):
         """Generates a column with independent and identically distributed (i.i.d.) samples from
         the standard normal distribution.
    +
    +    >>> df.withColumn('randn', randn(seed=42) ).collect()
    --- End diff --
    
    little nit: `) )` -> `))`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    @HyukjinKwon @gatorsmile bump


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121542536
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -793,9 +824,9 @@ def date_format(date, format):
         .. note:: Use when ever possible specialized functions like `year`. These benefit from a
             specialized implementation.
     
    -    >>> df = spark.createDataFrame([('2015-04-08',)], ['a'])
    -    >>> df.select(date_format('a', 'MM/dd/yyy').alias('date')).collect()
    -    [Row(date=u'04/08/2015')]
    +    >>> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
    +    >>> df.select(date_format('dt', 'MM/dd/yyy').alias('dt2')).collect()
    --- End diff --
    
    I think `date` was also fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79165/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121860061
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -189,15 +210,15 @@ def _():
     }
     
     for _name, _doc in _functions.items():
    -    globals()[_name] = since(1.3)(_create_function(_name, _doc))
    +    globals()[_name] = since(1.3)(ignore_unicode_prefix(_create_function(_name, _doc)))
    --- End diff --
    
    And .. the easiest way I think is to take out the examples or not use unicode in the doctests for now as `ignore_unicode_prefix` requires a function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121495497
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1254,23 +1294,41 @@ def hash(*cols):
     
     # ---------------------- String/Binary functions ------------------------------
     
    +_lower_doc = """
    +    Converts a string column to lower case.
    +
    +    >>> df.select(lower(df.name) ).collect()
    +    [Row(lower(name)=u'alice'), Row(lower(name)=u'bob')]
    +    """
    +_upper_doc = """
    +    Converts a string column to upper case.
    +
    +    >>> df.select(upper(df.name) ).collect()
    --- End diff --
    
    nit: remove an extra space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123182922
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -67,9 +67,15 @@ def _():
         _.__doc__ = 'Window function: ' + doc
         return _
     
    +_lit_doc = """
    +    Creates a :class:`Column` of literal value.
     
    +    >>> df.withColumn('height', lit(5)).withColumn('spark_user', lit(True)).collect()
    +    [Row(age=2, name=u'Alice', height=5, spark_user=True),
    +     Row(age=5, name=u'Bob', height=5, spark_user=True)]
    --- End diff --
    
    Here too, I think we can just do something like `df.select(hight, lit(5)).collect()` or `show()` to avoid `ignore_unicode_prefix`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78159 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78159/testReport)** for PR 17865 at commit [`b4c04e7`](https://github.com/apache/spark/commit/b4c04e762e122f575292af7aa00934bb60501aa2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121537735
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -92,14 +98,16 @@ def _():
     _functions_1_4 = {
         # unary math functions
         'acos': 'Computes the cosine inverse of the given value; the returned angle is in the range' +
    -            '0.0 through pi.',
    -    'asin': 'Computes the sine inverse of the given value; the returned angle is in the range' +
    -            '-pi/2 through pi/2.',
    -    'atan': 'Computes the tangent inverse of the given value.',
    +            '0.0 through pi.\n\n:param col: float column, units in radians.',
    +    'asin': """Computes the sine inverse of the given value; the returned angle is in the range
    +            -pi/2 through pi/2.
    +
    +            :param col: float column, units in radians.""",
    --- End diff --
    
    Yea, I personally think adding param looks better. I would say a :class:\`DoubleType\` column or not specify the type (e.g., "column to compute the sine inverse on"). Not a strong opinion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121495519
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1254,23 +1294,41 @@ def hash(*cols):
     
     # ---------------------- String/Binary functions ------------------------------
     
    +_lower_doc = """
    +    Converts a string column to lower case.
    +
    +    >>> df.select(lower(df.name) ).collect()
    +    [Row(lower(name)=u'alice'), Row(lower(name)=u'bob')]
    +    """
    +_upper_doc = """
    +    Converts a string column to upper case.
    +
    +    >>> df.select(upper(df.name) ).collect()
    +    [Row(upper(name)=u'ALICE'), Row(upper(name)=u'BOB')]
    +    """
    +_reverse_doc = """
    +    Reverses the string column and returns it as a new string column.
    +
    +    >>> df.select(reverse(df.name) ).collect()
    --- End diff --
    
    nit: remove an extra space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79162/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123184878
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1073,12 +1109,17 @@ def last_day(date):
         return Column(sc._jvm.functions.last_day(_to_java_column(date)))
     
     
    +@ignore_unicode_prefix
     @since(1.5)
     def from_unixtime(timestamp, format="yyyy-MM-dd HH:mm:ss"):
         """
         Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
         representing the timestamp of that moment in the current system time zone in the given
         format.
    +
    +    >>> time_df = spark.createDataFrame([(1428476400, )], ['unix_time'])
    +    >>> time_df.select(from_unixtime('unix_time').alias('ts') ).collect()
    --- End diff --
    
    nit: `) )` -> `))`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r115415128
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -153,7 +173,7 @@ def _():
     # math functions that take two arguments as input
     _binary_mathfunctions = {
         'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' +
    -             'polar coordinates (r, theta).',
    +             'polar coordinates (r, theta). Units in radians.',
    --- End diff --
    
    I see. What do you think about adding this in `:param`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    I switched the examples to numeric examples so that we could avoid using `ignore_unicode_prefix`, as requested by @HyukjinKwon in [this commit](https://github.com/apache/spark/pull/17865/commits/0ab91297c09628377fffc04b3f54216bf2581a89#diff-f5295f69bfbdbf6e161aed54057ea36dL144)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123964695
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1116,12 +1160,12 @@ def from_utc_timestamp(timestamp, tz):
     @since(1.5)
     def to_utc_timestamp(timestamp, tz):
         """
    -    Given a timestamp, which corresponds to a certain time of day in the given timezone, returns
    --- End diff --
    
    Let's revert this change or do this the same thing to Scala/R too if it matters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121857727
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -92,14 +98,16 @@ def _():
     _functions_1_4 = {
         # unary math functions
         'acos': 'Computes the cosine inverse of the given value; the returned angle is in the range' +
    -            '0.0 through pi.',
    -    'asin': 'Computes the sine inverse of the given value; the returned angle is in the range' +
    -            '-pi/2 through pi/2.',
    -    'atan': 'Computes the tangent inverse of the given value.',
    +            '0.0 through pi.\n\n:param col: float column, units in radians.',
    +    'asin': """Computes the sine inverse of the given value; the returned angle is in the range
    +            -pi/2 through pi/2.
    +
    +            :param col: float column, units in radians.""",
    --- End diff --
    
    I was trying two different ways to see which worked best. I now use the `:param col:` version


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79207/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r115134581
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -409,7 +432,7 @@ def isnan(col):
     
     @since(1.6)
     def isnull(col):
    -    """An expression that returns true iff the column is null.
    +    """An expression that returns true if the column is null.
    --- End diff --
    
    `the column`? This is misleading. We should make the Python documents consistent with what we did in Scala. 
    For example, `isNull` in Scala APIs is described as
    > Returns true if `expr` is null, or false otherwise.
    
    Ref: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/nullExpressions.scala#L280


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r125536602
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -1900,8 +1901,8 @@ setMethod("year",
     
     #' @details
     #' \code{atan2}: Returns the angle theta from the conversion of rectangular coordinates
    -#' (x, y) to polar coordinates (r, theta).
    -#'
    +#' (x, y) to polar coordinates (r, theta). Units in radians.
    +#
    --- End diff --
    
    Sorry... missing `'`...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78188 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78188/testReport)** for PR 17865 at commit [`e1bb723`](https://github.com/apache/spark/commit/e1bb723a04900704889fa0c66a14723193898bf0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r125201828
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1073,12 +1108,17 @@ def last_day(date):
         return Column(sc._jvm.functions.last_day(_to_java_column(date)))
     
     
    +@ignore_unicode_prefix
     @since(1.5)
     def from_unixtime(timestamp, format="yyyy-MM-dd HH:mm:ss"):
         """
         Converts the number of seconds from unix epoch (1970-01-01 00:00:00 UTC) to a string
         representing the timestamp of that moment in the current system time zone in the given
         format.
    +
    +    >>> time_df = spark.createDataFrame([(1428476400, )], ['unix_time'])
    --- End diff --
    
    little nit: `(1428476400, )` -> `(1428476400,)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Most recent commit reverts the "if" to "iff", changes all the backticks for column names to single backtick, and tried new `:param:` option for angle columns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r115415714
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1120,12 +1159,12 @@ def from_utc_timestamp(timestamp, tz):
     @since(1.5)
     def to_utc_timestamp(timestamp, tz):
         """
    -    Given a timestamp, which corresponds to a certain time of day in the given timezone, returns
    -    another timestamp that corresponds to the same time of day in UTC.
    +    Given a `timestamp`, which corresponds to a time of day in the timezone `tz`,
    --- End diff --
    
    No, I think we have a rule about this up to my knowledge. Thank you for the pointers and looking into this. Let's follow the majority then for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    LGTM too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78188/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123967003
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -95,10 +100,13 @@ def _():
                 '0.0 through pi.',
         'asin': 'Computes the sine inverse of the given value; the returned angle is in the range' +
                 '-pi/2 through pi/2.',
    -    'atan': 'Computes the tangent inverse of the given value.',
    +    'atan': 'Computes the tangent inverse of the given value; the returned angle is in the range' +
    +            '-pi/2 through pi/2',
    --- End diff --
    
    Let's revert this change or add the same information in Scala/R too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121495462
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1254,23 +1294,41 @@ def hash(*cols):
     
     # ---------------------- String/Binary functions ------------------------------
     
    +_lower_doc = """
    +    Converts a string column to lower case.
    +
    +    >>> df.select(lower(df.name) ).collect()
    --- End diff --
    
    nit: remove an extra space.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r114930599
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -910,8 +941,8 @@ def weekofyear(col):
         """
         Extract the week number of a given date as integer.
     
    -    >>> df = spark.createDataFrame([('2015-04-08',)], ['a'])
    -    >>> df.select(weekofyear(df.a).alias('week')).collect()
    +    >>> df = spark.createDataFrame([('2015-04-08',)], ['time'])
    --- End diff --
    
    Let's use `d` for `DateType` or `datetime.date` similarly with other existing names. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123964972
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1258,7 +1302,7 @@ def hash(*cols):
                    'uppercase. Words are delimited by whitespace.',
         'lower': 'Converts a string column to lower case.',
         'upper': 'Converts a string column to upper case.',
    -    'reverse': 'Reverses the string column and returns it as a new string column.',
    +    'reverse': 'Reverses a string column and returns it as a new string column.',
    --- End diff --
    
    ditto. I guess we should revert this change or match this change to R/Scala ones too if it matters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r115385050
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -793,8 +824,8 @@ def date_format(date, format):
         .. note:: Use when ever possible specialized functions like `year`. These benefit from a
             specialized implementation.
     
    -    >>> df = spark.createDataFrame([('2015-04-08',)], ['a'])
    -    >>> df.select(date_format('a', 'MM/dd/yyy').alias('date')).collect()
    +    >>> df = spark.createDataFrame([('2015-04-08',)], ['time'])
    --- End diff --
    
    I am replacing all of the timestamps of form `yyyy-MM-dd` with `dt`, and all of the `yyyy-MM-dd HH:mm:ss` with ts, with some exceptions. Hopefully this addresses this point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79164/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78158 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78158/testReport)** for PR 17865 at commit [`29bcb67`](https://github.com/apache/spark/commit/29bcb67eb85bf79781d3a3bf1cb35cdcfe433f5c).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78189 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78189/testReport)** for PR 17865 at commit [`e6cb8a1`](https://github.com/apache/spark/commit/e6cb8a1e830ffa18d6bdbcbeb2aa6456bc6569da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Yea. For https://github.com/apache/spark/pull/17865#issuecomment-313303801, just to explain the context (as it is a bit confusing even to me), it was suggested to use `ignore_unicode_prefix` for single instance when the example has an unicode in the doctest _or_ to remove the unicode in the doctest if possible (either should be fine).
    
    I also suggested to not use it for unrelated docstrings - https://github.com/apache/spark/pull/17865#discussion_r121539034.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78189 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78189/testReport)** for PR 17865 at commit [`e6cb8a1`](https://github.com/apache/spark/commit/e6cb8a1e830ffa18d6bdbcbeb2aa6456bc6569da).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r114929993
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -67,9 +67,16 @@ def _():
         _.__doc__ = 'Window function: ' + doc
         return _
     
    +_lit_doc = """
    +    Creates a :class:`Column` of literal value. Supports basic types like :class:`IntegerType`,
    +    :class:`FloatType`, :class:`BooleanType`, and :class:`StringType`
    --- End diff --
    
    I would like to keep this identical with the one in `functions.scala` to reduce overhead when someone sweeps the same documentation changes across APIs in other languages.
    
    If the additional informations are Python-specific, let's add it in `::note`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123183863
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -200,17 +226,20 @@ def _():
     @since(1.3)
     def approxCountDistinct(col, rsd=None):
         """
    -    .. note:: Deprecated in 2.1, use approx_count_distinct instead.
    +    .. note:: Deprecated in 2.1, use :func:`approx_count_distinct` instead.
         """
         return approx_count_distinct(col, rsd)
     
     
     @since(2.1)
    -def approx_count_distinct(col, rsd=None):
    -    """Returns a new :class:`Column` for approximate distinct count of ``col``.
    +def approx_count_distinct(col, rsd=0.05):
    --- End diff --
    
    Why do we change this to `0.05`? We then should change the default values in both places in `HyperLogLogPlusPlus` and  here if we happen to change this in any event. Let's leave it `None`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123186254
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -312,7 +338,7 @@ def covar_samp(col1, col2):
     
     @since(1.3)
     def countDistinct(col, *cols):
    -    """Returns a new :class:`Column` for distinct count of ``col`` or ``cols``.
    +    """Returns a new :class:`Column` for distinct count of `col` or `cols`.
    --- End diff --
    
    I guess we did not decide which one of `` `...` `` or ``` ``...`` ``` is more preferred and right. Let's avoid sweeping them. Adding new one is fine (and I usually follow majority in this case) but changing the existing one might be not if we don't know which one is not preferred. We might happen to change it forward and backward in time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121861352
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -962,9 +993,9 @@ def add_months(start, months):
         """
         Returns the date that is `months` months after `start`
     
    -    >>> df = spark.createDataFrame([('2015-04-08',)], ['d'])
    -    >>> df.select(add_months(df.d, 1).alias('d')).collect()
    -    [Row(d=datetime.date(2015, 5, 8))]
    +    >>> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
    --- End diff --
    
    Okay, but let's sweep the other instances here if there are.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Do I just have to `git rebase` on master?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79189 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79189/testReport)** for PR 17865 at commit [`9cc34c8`](https://github.com/apache/spark/commit/9cc34c895eaad5ba4366308a0324c8adb2f9510e).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79189/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123179950
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -147,7 +173,7 @@ def _():
     # math functions that take two arguments as input
     _binary_mathfunctions = {
         'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' +
    -             'polar coordinates (r, theta).',
    +             'polar coordinates (r, theta). Units in radians.',
    --- End diff --
    
    I think we should ...
    
    1. copy and paste `Units in radians.` to both https://github.com/apache/spark/blob/e5387018e76a9af1318e78c4133ee68232e6a159/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1342 and https://github.com/apache/spark/blob/8965fe764a4218d944938aa4828072f1ad9dbda7/R/pkg/R/functions.R#L2030 just to be consistent
    
    2. Use Python specific notation to describe this here for now.
    
    3. Just revert this change for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123941351
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -426,7 +426,7 @@ setMethod("covar_pop", signature(col1 = "characterOrColumn", col2 = "characterOr
     
     #' cos
     #'
    -#' Computes the cosine of the given value.
    +#' Computes the cosine of the given value. Units in radians.
    --- End diff --
    
    looks like this will conflict with #18371



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r114929646
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -397,7 +420,7 @@ def input_file_name():
     
     @since(1.6)
     def isnan(col):
    -    """An expression that returns true iff the column is NaN.
    +    """An expression that returns true if the column is NaN.
    --- End diff --
    
    I think "iff" is the abbreviation for "if and only if". I don't think it is worth changing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123962284
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -426,7 +426,7 @@ setMethod("covar_pop", signature(col1 = "characterOrColumn", col2 = "characterOr
     
     #' cos
     #'
    -#' Computes the cosine of the given value.
    +#' Computes the cosine of the given value. Units in radians.
    --- End diff --
    
    BTW, just while you are here @felixcheung, does this change itself look okay to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78159 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78159/testReport)** for PR 17865 at commit [`b4c04e7`](https://github.com/apache/spark/commit/b4c04e762e122f575292af7aa00934bb60501aa2).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    merged to master. thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78586/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r115415748
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1120,12 +1159,12 @@ def from_utc_timestamp(timestamp, tz):
     @since(1.5)
     def to_utc_timestamp(timestamp, tz):
         """
    -    Given a timestamp, which corresponds to a certain time of day in the given timezone, returns
    -    another timestamp that corresponds to the same time of day in UTC.
    +    Given a `timestamp`, which corresponds to a time of day in the timezone `tz`,
    --- End diff --
    
    No, I don't think we have a rule about this up to my knowledge. Thank you for the pointers and looking into this. Let's follow the majority then for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121493879
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -466,10 +487,15 @@ def nanvl(col1, col2):
         return Column(sc._jvm.functions.nanvl(_to_java_column(col1), _to_java_column(col2)))
     
     
    +@ignore_unicode_prefix
     @since(1.4)
     def rand(seed=None):
         """Generates a random column with independent and identically distributed (i.i.d.) samples
         from U[0.0, 1.0].
    +
    +    >>> df.withColumn('rand',rand(seed=42) * 3).collect()
    --- End diff --
    
    nit: add a space after `'rand',`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123184803
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -969,8 +1005,8 @@ def months_between(date1, date2):
         """
         Returns the number of months between date1 and date2.
     
    -    >>> df = spark.createDataFrame([('1997-02-28 10:30:00', '1996-10-30')], ['t', 'd'])
    -    >>> df.select(months_between(df.t, df.d).alias('months')).collect()
    +    >>> df = spark.createDataFrame([('1997-02-28 10:30:00', '1996-10-30')], ['date1', 'date2'])
    --- End diff --
    
    `t` and `d` looks already fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79164 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79164/testReport)** for PR 17865 at commit [`2c9898b`](https://github.com/apache/spark/commit/2c9898bff7f4b65a303ac2f679cd390700b531ef).
     * This patch **fails some tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123180524
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1258,7 +1303,7 @@ def hash(*cols):
                    'uppercase. Words are delimited by whitespace.',
         'lower': 'Converts a string column to lower case.',
         'upper': 'Converts a string column to upper case.',
    -    'reverse': 'Reverses the string column and returns it as a new string column.',
    +    'reverse': 'Reverses a string column and returns it as a new string column.',
    --- End diff --
    
    ditto. I guess we should match this change to R/Scala ones too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78159/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r115385649
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1120,12 +1159,12 @@ def from_utc_timestamp(timestamp, tz):
     @since(1.5)
     def to_utc_timestamp(timestamp, tz):
         """
    -    Given a timestamp, which corresponds to a certain time of day in the given timezone, returns
    -    another timestamp that corresponds to the same time of day in UTC.
    +    Given a `timestamp`, which corresponds to a time of day in the timezone `tz`,
    --- End diff --
    
    I had a quick question about this. The documentation seems pretty inconsistent about when to use \` vs \`\`. For example, in [approx_count_distinct](http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.functions.approx_count_distinct), the parameter `col` is referred to with ``, while in [rpad](http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.functions.rpad), the parameters are referred to with `.
    
    Numpy appears to use single backticks for parameters:
    https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt#sections
    
    Is there a reference for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79165 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79165/testReport)** for PR 17865 at commit [`60af595`](https://github.com/apache/spark/commit/60af5959a340068ea82d4e96c92b634959494f93).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123185008
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1105,9 +1150,9 @@ def from_utc_timestamp(timestamp, tz):
         Given a timestamp, which corresponds to a certain time of day in UTC, returns another timestamp
         that corresponds to the same time of day in the given timezone.
     
    -    >>> df = spark.createDataFrame([('1997-02-28 10:30:00',)], ['t'])
    -    >>> df.select(from_utc_timestamp(df.t, "PST").alias('t')).collect()
    -    [Row(t=datetime.datetime(1997, 2, 28, 2, 30))]
    --- End diff --
    
    `t` to `ts` looks unnecessary change here and the one below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    @HyukjinKwon I ended up not making examples for the aggregate functions, as I didn't make a good dataframe to demonstrate them. I could add more examples for the string functions if you think that is a good idea. There are dozens of functions that could be documented, I'm not sure how far we want to go, or which ones need it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76477/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121491673
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -109,15 +117,29 @@ def _():
         'rint': 'Returns the double value that is closest in value to the argument and' +
                 ' is equal to a mathematical integer.',
         'signum': 'Computes the signum of the given value.',
    -    'sin': 'Computes the sine of the given value.',
    -    'sinh': 'Computes the hyperbolic sine of the given value.',
    -    'tan': 'Computes the tangent of the given value.',
    -    'tanh': 'Computes the hyperbolic tangent of the given value.',
    -    'toDegrees': '.. note:: Deprecated in 2.1, use degrees instead.',
    -    'toRadians': '.. note:: Deprecated in 2.1, use radians instead.',
    +    'sin': 'Computes the sine of the given value. Units in radians',
    +    'sinh': 'Computes the hyperbolic sine of the given value. Units in radians.',
    +    'tan': 'Computes the tangent of the given value. Units in radians.',
    +    'tanh': 'Computes the hyperbolic tangent of the given value. Units in radians.',
    +    'toDegrees': '.. note:: Deprecated in 2.1, use :func:`degrees` instead.',
    +    'toRadians': '.. note:: Deprecated in 2.1, use :func:`radians` instead.',
         'bitwiseNOT': 'Computes bitwise not.',
     }
     
    +_collect_list_doc = """
    +    Aggregate function: returns a list of objects with duplicates.
    +
    +    >>> df2 = spark.createDataFrame([('Alice', 2), ('Bob', 5),('Alice', 99)], ('name', 'age'))
    +    >>> df2.agg(collect_list('name')).collect()
    +    [Row(collect_list(name)=[u'Alice', u'Bob', u'Alice'])]
    +    """
    +_collect_set_doc = """
    +    Aggregate function: returns a set of objects with duplicate elements eliminated.
    +
    +    >>> df2 = spark.createDataFrame([('Alice', 2), ('Bob', 5),('Alice', 99)], ('name', 'age'))
    --- End diff --
    
    ditto.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r114929803
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1120,12 +1159,12 @@ def from_utc_timestamp(timestamp, tz):
     @since(1.5)
     def to_utc_timestamp(timestamp, tz):
         """
    -    Given a timestamp, which corresponds to a certain time of day in the given timezone, returns
    -    another timestamp that corresponds to the same time of day in UTC.
    +    Given a `timestamp`, which corresponds to a time of day in the timezone `tz`,
    --- End diff --
    
    Should this be ``` ``timestamp`` ``` not `` `timestamp` ``?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121860217
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -962,9 +993,9 @@ def add_months(start, months):
         """
         Returns the date that is `months` months after `start`
     
    -    >>> df = spark.createDataFrame([('2015-04-08',)], ['d'])
    -    >>> df.select(add_months(df.d, 1).alias('d')).collect()
    -    [Row(d=datetime.date(2015, 5, 8))]
    +    >>> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
    --- End diff --
    
    In the original example, the column name `d` is reused for the new column. I think using a new column name helps differentiate the original date from the new date.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121543879
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1254,23 +1294,41 @@ def hash(*cols):
     
     # ---------------------- String/Binary functions ------------------------------
     
    +_lower_doc = """
    +    Converts a string column to lower case.
    +
    +    >>> df.select(lower(df.name) ).collect()
    +    [Row(lower(name)=u'alice'), Row(lower(name)=u'bob')]
    +    """
    +_upper_doc = """
    +    Converts a string column to upper case.
    +
    +    >>> df.select(upper(df.name) ).collect()
    +    [Row(upper(name)=u'ALICE'), Row(upper(name)=u'BOB')]
    +    """
    +_reverse_doc = """
    +    Reverses the string column and returns it as a new string column.
    +
    +    >>> df.select(reverse(df.name) ).collect()
    +    [Row(reverse(name)=u'ecilA'), Row(reverse(name)=u'boB')]
    +    """
     _string_functions = {
         'ascii': 'Computes the numeric value of the first character of the string column.',
         'base64': 'Computes the BASE64 encoding of a binary column and returns it as a string column.',
         'unbase64': 'Decodes a BASE64 encoded string column and returns it as a binary column.',
         'initcap': 'Returns a new string column by converting the first letter of each word to ' +
                    'uppercase. Words are delimited by whitespace.',
    -    'lower': 'Converts a string column to lower case.',
    -    'upper': 'Converts a string column to upper case.',
    -    'reverse': 'Reverses the string column and returns it as a new string column.',
    +    'lower': _lower_doc,
    +    'upper': _upper_doc,
    +    'reverse': _reverse_doc,
         'ltrim': 'Trim the spaces from left end for the specified string value.',
         'rtrim': 'Trim the spaces from right end for the specified string value.',
         'trim': 'Trim the spaces from both ends for the specified string column.',
     }
     
     
     for _name, _doc in _string_functions.items():
    -    globals()[_name] = since(1.5)(_create_function(_name, _doc))
    +    globals()[_name] = since(1.5)(ignore_unicode_prefix(_create_function(_name, _doc)))
    --- End diff --
    
    Hm.. can we avoid applying this to all docs too? One way is just to take out the examples.  These string function examples looks easy and straightforward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #78031 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78031/testReport)** for PR 17865 at commit [`8c34c8b`](https://github.com/apache/spark/commit/8c34c8b9073863178ff74c18a01dcf0db32e6805).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r115404673
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -153,7 +173,7 @@ def _():
     # math functions that take two arguments as input
     _binary_mathfunctions = {
         'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' +
    -             'polar coordinates (r, theta).',
    +             'polar coordinates (r, theta). Units in radians.',
    --- End diff --
    
    Most libraries seem to default to radians. However, I checked the R, numpy, and MATLAB notes for common trigonometry functions, and they all note the units in the function documentation, e.g.:
    https://docs.scipy.org/doc/numpy/reference/generated/numpy.sin.html
    https://www.mathworks.com/help/matlab/ref/sin.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r114930366
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -793,8 +824,8 @@ def date_format(date, format):
         .. note:: Use when ever possible specialized functions like `year`. These benefit from a
             specialized implementation.
     
    -    >>> df = spark.createDataFrame([('2015-04-08',)], ['a'])
    -    >>> df.select(date_format('a', 'MM/dd/yyy').alias('date')).collect()
    +    >>> df = spark.createDataFrame([('2015-04-08',)], ['time'])
    --- End diff --
    
    Okay. I guess it is a documentation improvement to use a bit more meaningful name over an arbitrary name `a`. Let's match these to existing names such as `ts` or `t` (abbreviation for timestamp) or `dt` (abbreviation for `datetime.datetime`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    I think I addressed everything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17865


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    @HyukjinKwon I think I addressed all of your comments. Thank you for your detailed review!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121861095
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -1254,23 +1294,41 @@ def hash(*cols):
     
     # ---------------------- String/Binary functions ------------------------------
     
    +_lower_doc = """
    +    Converts a string column to lower case.
    +
    +    >>> df.select(lower(df.name) ).collect()
    +    [Row(lower(name)=u'alice'), Row(lower(name)=u'bob')]
    +    """
    +_upper_doc = """
    +    Converts a string column to upper case.
    +
    +    >>> df.select(upper(df.name) ).collect()
    +    [Row(upper(name)=u'ALICE'), Row(upper(name)=u'BOB')]
    +    """
    +_reverse_doc = """
    +    Reverses the string column and returns it as a new string column.
    +
    +    >>> df.select(reverse(df.name) ).collect()
    +    [Row(reverse(name)=u'ecilA'), Row(reverse(name)=u'boB')]
    +    """
     _string_functions = {
         'ascii': 'Computes the numeric value of the first character of the string column.',
         'base64': 'Computes the BASE64 encoding of a binary column and returns it as a string column.',
         'unbase64': 'Decodes a BASE64 encoded string column and returns it as a binary column.',
         'initcap': 'Returns a new string column by converting the first letter of each word to ' +
                    'uppercase. Words are delimited by whitespace.',
    -    'lower': 'Converts a string column to lower case.',
    -    'upper': 'Converts a string column to upper case.',
    -    'reverse': 'Reverses the string column and returns it as a new string column.',
    +    'lower': _lower_doc,
    +    'upper': _upper_doc,
    +    'reverse': _reverse_doc,
         'ltrim': 'Trim the spaces from left end for the specified string value.',
         'rtrim': 'Trim the spaces from right end for the specified string value.',
         'trim': 'Trim the spaces from both ends for the specified string column.',
     }
     
     
     for _name, _doc in _string_functions.items():
    -    globals()[_name] = since(1.5)(_create_function(_name, _doc))
    +    globals()[_name] = since(1.5)(ignore_unicode_prefix(_create_function(_name, _doc)))
    --- End diff --
    
    In that way, we should add all examples for each function here. I am not too sure if we need them all for now. At least per - https://issues.apache.org/jira/browse/SPARK-17963?focusedCommentId=15581022&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15581022.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #76477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76477/testReport)** for PR 17865 at commit [`91515c6`](https://github.com/apache/spark/commit/91515c620287e193c6d208038025fe194740e4d2).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    @gatorsmile I checked four functions, `approx_count_distinct`, `coalesce`, `covar_samp`, and `countDistinct`, comparing the python and Scala documentation. None of them are the same. My guess is that the python docs differ for most functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Will take a look to help too soon. Sorry for the delay.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123960210
  
    --- Diff: R/pkg/R/functions.R ---
    @@ -426,7 +426,7 @@ setMethod("covar_pop", signature(col1 = "characterOrColumn", col2 = "characterOr
     
     #' cos
     #'
    -#' Computes the cosine of the given value.
    +#' Computes the cosine of the given value. Units in radians.
    --- End diff --
    
    Yes, if that one is merged first, we should resolve the conflict here. If this one gets merged first, that one should resolve the conflict. Describing the same information across APIs looks reasonable to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #76763 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76763/testReport)** for PR 17865 at commit [`dd7a397`](https://github.com/apache/spark/commit/dd7a3971275aae97f9adfd82166df23b038eb950).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123912061
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -969,8 +1005,8 @@ def months_between(date1, date2):
         """
         Returns the number of months between date1 and date2.
     
    -    >>> df = spark.createDataFrame([('1997-02-28 10:30:00', '1996-10-30')], ['t', 'd'])
    -    >>> df.select(months_between(df.t, df.d).alias('months')).collect()
    +    >>> df = spark.createDataFrame([('1997-02-28 10:30:00', '1996-10-30')], ['date1', 'date2'])
    --- End diff --
    
    I reverted all the other ones, but I like this one because `date1` and `date2` match the docstring. Is that reasonable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121539157
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -189,15 +210,15 @@ def _():
     }
     
     for _name, _doc in _functions.items():
    -    globals()[_name] = since(1.3)(_create_function(_name, _doc))
    +    globals()[_name] = since(1.3)(ignore_unicode_prefix(_create_function(_name, _doc)))
     for _name, _doc in _functions_1_4.items():
         globals()[_name] = since(1.4)(_create_function(_name, _doc))
     for _name, _doc in _binary_mathfunctions.items():
         globals()[_name] = since(1.4)(_create_binary_mathfunction(_name, _doc))
     for _name, _doc in _window_functions.items():
         globals()[_name] = since(1.6)(_create_window_function(_name, _doc))
     for _name, _doc in _functions_1_6.items():
    -    globals()[_name] = since(1.6)(_create_function(_name, _doc))
    +    globals()[_name] = since(1.6)(ignore_unicode_prefix(_create_function(_name, _doc)))
    --- End diff --
    
    This one looks also the same instance. Let's avoid adding unicode examples in `_collect_list_doc` and `_collect_set_doc` for now if we are all fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79207 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79207/testReport)** for PR 17865 at commit [`9cc34c8`](https://github.com/apache/spark/commit/9cc34c895eaad5ba4366308a0324c8adb2f9510e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123245717
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -200,17 +226,20 @@ def _():
     @since(1.3)
     def approxCountDistinct(col, rsd=None):
         """
    -    .. note:: Deprecated in 2.1, use approx_count_distinct instead.
    +    .. note:: Deprecated in 2.1, use :func:`approx_count_distinct` instead.
         """
         return approx_count_distinct(col, rsd)
     
     
     @since(2.1)
    -def approx_count_distinct(col, rsd=None):
    -    """Returns a new :class:`Column` for approximate distinct count of ``col``.
    +def approx_count_distinct(col, rsd=0.05):
    --- End diff --
    
    Okay, I double checked some APIs have them but let's leave this out. It sounds requiring fixing/checking logic below and the deprecated one above. I assume this should be doc only change PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78183/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79118 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79118/testReport)** for PR 17865 at commit [`f17f332`](https://github.com/apache/spark/commit/f17f332dd97b948f8dd31eb2b18c1e11dc7fead0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121543259
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -962,9 +993,9 @@ def add_months(start, months):
         """
         Returns the date that is `months` months after `start`
     
    -    >>> df = spark.createDataFrame([('2015-04-08',)], ['d'])
    -    >>> df.select(add_months(df.d, 1).alias('d')).collect()
    -    [Row(d=datetime.date(2015, 5, 8))]
    +    >>> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
    --- End diff --
    
    I think changing arbitrary name to `dt` or `ts` is fine but not worth changing `t`, `timestamp`, `d` or `date`. These are already readable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r123184653
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -793,9 +824,9 @@ def date_format(date, format):
         .. note:: Use when ever possible specialized functions like `year`. These benefit from a
             specialized implementation.
     
    -    >>> df = spark.createDataFrame([('2015-04-08',)], ['a'])
    -    >>> df.select(date_format('a', 'MM/dd/yyy').alias('date')).collect()
    -    [Row(date=u'04/08/2015')]
    +    >>> df = spark.createDataFrame([('2015-04-08',)], ['dt'])
    +    >>> df.select(date_format('dt', 'MM/dd/yyy').alias('dt2')).collect()
    --- End diff --
    
    `date` to `dt2` sounds unnecessary change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121491632
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -109,15 +117,29 @@ def _():
         'rint': 'Returns the double value that is closest in value to the argument and' +
                 ' is equal to a mathematical integer.',
         'signum': 'Computes the signum of the given value.',
    -    'sin': 'Computes the sine of the given value.',
    -    'sinh': 'Computes the hyperbolic sine of the given value.',
    -    'tan': 'Computes the tangent of the given value.',
    -    'tanh': 'Computes the hyperbolic tangent of the given value.',
    -    'toDegrees': '.. note:: Deprecated in 2.1, use degrees instead.',
    -    'toRadians': '.. note:: Deprecated in 2.1, use radians instead.',
    +    'sin': 'Computes the sine of the given value. Units in radians',
    +    'sinh': 'Computes the hyperbolic sine of the given value. Units in radians.',
    +    'tan': 'Computes the tangent of the given value. Units in radians.',
    +    'tanh': 'Computes the hyperbolic tangent of the given value. Units in radians.',
    +    'toDegrees': '.. note:: Deprecated in 2.1, use :func:`degrees` instead.',
    +    'toRadians': '.. note:: Deprecated in 2.1, use :func:`radians` instead.',
         'bitwiseNOT': 'Computes bitwise not.',
     }
     
    +_collect_list_doc = """
    +    Aggregate function: returns a list of objects with duplicates.
    +
    +    >>> df2 = spark.createDataFrame([('Alice', 2), ('Bob', 5),('Alice', 99)], ('name', 'age'))
    --- End diff --
    
    nit: add a space after `('Bob, 5),`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    **[Test build #79162 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79162/testReport)** for PR 17865 at commit [`f5d0d0f`](https://github.com/apache/spark/commit/f5d0d0f2d1649305213e50d9e76180f449027a4a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78029/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r114929597
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -206,17 +226,20 @@ def _():
     @since(1.3)
     def approxCountDistinct(col, rsd=None):
         """
    -    .. note:: Deprecated in 2.1, use approx_count_distinct instead.
    +    .. note:: Deprecated in 2.1, use :func:`approx_count_distinct instead`.
    --- End diff --
    
    Probably `` :func:`approx_count_distinct` ``?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121494100
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -479,10 +505,15 @@ def rand(seed=None):
         return Column(jc)
     
     
    +@ignore_unicode_prefix
     @since(1.4)
     def randn(seed=None):
         """Generates a column with independent and identically distributed (i.i.d.) samples from
         the standard normal distribution.
    +
    +    >>> df.withColumn('randn',randn(seed=42) ).collect()
    --- End diff --
    
    nit: add a space after `'randn',` and remove a space after `randn(seed=42)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78158/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by holdenk <gi...@git.apache.org>.
Github user holdenk commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Thanks for working on this :) More complete documentation is certainly useful. Looking at the "How was this patch tested?" section you might also want to build the docs (there are instructions in `docs/README.md`) to see how stuff is formatted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r121541415
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -206,17 +227,20 @@ def _():
     @since(1.3)
     def approxCountDistinct(col, rsd=None):
         """
    -    .. note:: Deprecated in 2.1, use approx_count_distinct instead.
    +    .. note:: Deprecated in 2.1, use :func:`approx_count_distinct` instead.
         """
         return approx_count_distinct(col, rsd)
     
     
     @since(2.1)
     def approx_count_distinct(col, rsd=None):
    -    """Returns a new :class:`Column` for approximate distinct count of ``col``.
    +    """Returns a new :class:`Column` for approximate distinct count of column `col`.
     
    -    >>> df.agg(approx_count_distinct(df.age).alias('c')).collect()
    -    [Row(c=2)]
    +    :param rsd: Residual (float). The approximate count will be within this fraction of the true
    --- End diff --
    
    For this one, let's add the contents in Scala/R one too - https://github.com/apache/spark/blob/3a840048ed3501e06260b7c5df18cc0bbdb1505c/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L245 and https://github.com/apache/spark/blob/7f203a248f94df6183a4bc4642a3d873171fef29/R/pkg/R/functions.R#L2243
    
    This one in Scala/R look already having the param contents to match with.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r115889357
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -153,7 +173,7 @@ def _():
     # math functions that take two arguments as input
     _binary_mathfunctions = {
         'atan2': 'Returns the angle theta from the conversion of rectangular coordinates (x, y) to' +
    -             'polar coordinates (r, theta).',
    +             'polar coordinates (r, theta). Units in radians.',
    --- End diff --
    
    I added `:param:` to both `acos` and `asin` using two methods. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17865: [SPARK-20456][Docs] Add examples for functions co...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17865#discussion_r114929075
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -131,9 +152,8 @@ def _():
         'var_pop':  'Aggregate function: returns the population variance of the values in a group.',
         'skewness': 'Aggregate function: returns the skewness of the values in a group.',
         'kurtosis': 'Aggregate function: returns the kurtosis of the values in a group.',
    -    'collect_list': 'Aggregate function: returns a list of objects with duplicates.',
    -    'collect_set': 'Aggregate function: returns a set of objects with duplicate elements' +
    -                   ' eliminated.',
    +    'collect_list': _collect_list_doc,
    --- End diff --
    
    Let's wrap it (and the same instances) with `ignore_unicode_prefix` like we(you) did before. Please refer https://github.com/apache/spark/blob/8ddf0d2a60795a2306f94df8eac6e265b1fe5230/python/pyspark/rdd.py#L146-L156 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Looks like there is a merge conflict. Can you fix it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    (Thank you @gatorsmile for triggering the test)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by map222 <gi...@git.apache.org>.
Github user map222 commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Could I get a review for this? I think the only remaining questions is whether (and how) to note the units for the trigonometry functions, like here:
    https://github.com/map222/spark/blob/dd7a3971275aae97f9adfd82166df23b038eb950/python/pyspark/sql/functions.py#L100-L105


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17865
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org