You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by gatorsmile <gi...@git.apache.org> on 2018/01/11 12:31:49 UTC

[GitHub] spark pull request #20234: [SPARK-19732] [Follow-up] Document behavior chang...

GitHub user gatorsmile opened a pull request:

    https://github.com/apache/spark/pull/20234

    [SPARK-19732] [Follow-up] Document behavior changes made in na.fill and fillna

    ## What changes were proposed in this pull request?
    https://github.com/apache/spark/pull/18164 introduces the behavior changes. We need to document it.
    
    ## How was this patch tested?
    N/A

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gatorsmile/spark docBehaviorChange

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20234.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20234
    
----
commit 89f00867f319cddf5ff49442b9ab38f1cefe837e
Author: gatorsmile <ga...@...>
Date:   2018-01-11T03:10:11Z

    fix

commit 09193499f522dee74d99262347e254e028e9d636
Author: gatorsmile <ga...@...>
Date:   2018-01-11T03:10:35Z

    clean

commit ff30553092a7bfe8d9aac3fc1f89b99ff679a2aa
Author: gatorsmile <ga...@...>
Date:   2018-01-11T03:11:56Z

    fix

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20234: [SPARK-19732] [Follow-up] Document behavior chang...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20234#discussion_r160952123
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1788,12 +1788,10 @@ options.
         Note that, for <b>DecimalType(38,0)*</b>, the table above intentionally does not cover all other combinations of scales and precisions because currently we only infer decimal type like `BigInteger`/`BigInt`. For example, 1.1 is inferred as double type.
       - In PySpark, now we need Pandas 0.19.2 or upper if you want to use Pandas related functionalities, such as `toPandas`, `createDataFrame` from Pandas DataFrame, etc.
       - In PySpark, the behavior of timestamp values for Pandas related functionalities was changed to respect session timezone. If you want to use the old behavior, you need to set a configuration `spark.sql.execution.pandas.respectSessionTimeZone` to `False`. See [SPARK-22395](https://issues.apache.org/jira/browse/SPARK-22395) for details.
    -
    - - Since Spark 2.3, when either broadcast hash join or broadcast nested loop join is applicable, we prefer to broadcasting the table that is explicitly specified in a broadcast hint. For details, see the section [Broadcast Hint](#broadcast-hint-for-sql-queries) and [SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489).
    -
    - - Since Spark 2.3, when all inputs are binary, `functions.concat()` returns an output as binary. Otherwise, it returns as a string. Until Spark 2.3, it always returns as a string despite of input types. To keep the old behavior, set `spark.sql.function.concatBinaryAsString` to `true`.
    -
    - - Since Spark 2.3, when all inputs are binary, SQL `elt()` returns an output as binary. Otherwise, it returns as a string. Until Spark 2.3, it always returns as a string despite of input types. To keep the old behavior, set `spark.sql.function.eltOutputAsString` to `true`.
    +  - In PySpark, `na.fill()` or `fillna` also accepts boolean and replaces NAs with booleans. In prior Spark versions, PySpark just ignores it and returns the original Dataset/DataFrame.  
    --- End diff --
    
    Sounds good to me.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20234: [SPARK-19732] [Follow-up] Document behavior changes made...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20234
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85963/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20234: [SPARK-19732] [Follow-up] Document behavior changes made...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20234
  
    **[Test build #85963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85963/testReport)** for PR 20234 at commit [`ff30553`](https://github.com/apache/spark/commit/ff30553092a7bfe8d9aac3fc1f89b99ff679a2aa).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20234: [SPARK-19732] [Follow-up] Document behavior changes made...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20234
  
    **[Test build #85965 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85965/testReport)** for PR 20234 at commit [`a2475ea`](https://github.com/apache/spark/commit/a2475ea5b86acee2380884db0756a833016b69a0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20234: [SPARK-19732] [Follow-up] Document behavior changes made...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/20234
  
    Merged to master and branch-2.3.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20234: [SPARK-19732] [Follow-up] Document behavior changes made...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20234
  
    **[Test build #85963 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85963/testReport)** for PR 20234 at commit [`ff30553`](https://github.com/apache/spark/commit/ff30553092a7bfe8d9aac3fc1f89b99ff679a2aa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20234: [SPARK-19732] [Follow-up] Document behavior changes made...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20234
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20234: [SPARK-19732] [Follow-up] Document behavior chang...

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20234#discussion_r160949121
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -1788,12 +1788,10 @@ options.
         Note that, for <b>DecimalType(38,0)*</b>, the table above intentionally does not cover all other combinations of scales and precisions because currently we only infer decimal type like `BigInteger`/`BigInt`. For example, 1.1 is inferred as double type.
       - In PySpark, now we need Pandas 0.19.2 or upper if you want to use Pandas related functionalities, such as `toPandas`, `createDataFrame` from Pandas DataFrame, etc.
       - In PySpark, the behavior of timestamp values for Pandas related functionalities was changed to respect session timezone. If you want to use the old behavior, you need to set a configuration `spark.sql.execution.pandas.respectSessionTimeZone` to `False`. See [SPARK-22395](https://issues.apache.org/jira/browse/SPARK-22395) for details.
    -
    - - Since Spark 2.3, when either broadcast hash join or broadcast nested loop join is applicable, we prefer to broadcasting the table that is explicitly specified in a broadcast hint. For details, see the section [Broadcast Hint](#broadcast-hint-for-sql-queries) and [SPARK-22489](https://issues.apache.org/jira/browse/SPARK-22489).
    -
    - - Since Spark 2.3, when all inputs are binary, `functions.concat()` returns an output as binary. Otherwise, it returns as a string. Until Spark 2.3, it always returns as a string despite of input types. To keep the old behavior, set `spark.sql.function.concatBinaryAsString` to `true`.
    -
    - - Since Spark 2.3, when all inputs are binary, SQL `elt()` returns an output as binary. Otherwise, it returns as a string. Until Spark 2.3, it always returns as a string despite of input types. To keep the old behavior, set `spark.sql.function.eltOutputAsString` to `true`.
    +  - In PySpark, `na.fill()` or `fillna` also accepts boolean and replaces NAs with booleans. In prior Spark versions, PySpark just ignores it and returns the original Dataset/DataFrame.  
    --- End diff --
    
    Shall we say `null` instead of `NA`? I actually think `null` is more correct.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20234: [SPARK-19732] [Follow-up] Document behavior chang...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20234


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20234: [SPARK-19732] [Follow-up] Document behavior changes made...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20234
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85965/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20234: [SPARK-19732] [Follow-up] Document behavior changes made...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20234
  
    **[Test build #85965 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85965/testReport)** for PR 20234 at commit [`a2475ea`](https://github.com/apache/spark/commit/a2475ea5b86acee2380884db0756a833016b69a0).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20234: [SPARK-19732] [Follow-up] Document behavior changes made...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20234
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20234: [SPARK-19732] [Follow-up] Document behavior changes made...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20234
  
    cc @rberenguel @ueshin @HyukjinKwon 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org