You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@spark.apache.org by gu...@apache.org on 2022/09/19 09:30:09 UTC

[spark] branch master updated: [SPARK-40447][PS][FOLLOWUP] Fix doc of `DataFrame.corr`

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 2317f13b7d0 [SPARK-40447][PS][FOLLOWUP] Fix doc of `DataFrame.corr`
2317f13b7d0 is described below

commit 2317f13b7d0db83a46449e55f6053dc8462ebb94
Author: Ruifeng Zheng <ru...@apache.org>
AuthorDate: Mon Sep 19 18:29:55 2022 +0900

    [SPARK-40447][PS][FOLLOWUP] Fix doc of `DataFrame.corr`
    
    ### What changes were proposed in this pull request?
    Fix doc of `DataFrame.corr`, it should be the implementation of `Kendall` in PS has the complexity of O(#row * #row), since it apply a cross join (within each partition) to compute the statistics
    
    ### Why are the changes needed?
    Fix doc of `DataFrame.corr`
    
    ### Does this PR introduce _any_ user-facing change?
    yes, doc fixed
    
    ### How was this patch tested?
    manually check
    
    Closes #37927 from zhengruifeng/ps_df_kendall_doc.
    
    Authored-by: Ruifeng Zheng <ru...@apache.org>
    Signed-off-by: Hyukjin Kwon <gu...@apache.org>
---
 python/pyspark/pandas/frame.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/python/pyspark/pandas/frame.py b/python/pyspark/pandas/frame.py
index d7b26cacda3..e2b70caf5d7 100644
--- a/python/pyspark/pandas/frame.py
+++ b/python/pyspark/pandas/frame.py
@@ -1448,7 +1448,7 @@ class DataFrame(Frame, Generic[T]):
         1. Pearson, Kendall and Spearman correlation are currently computed using pairwise
            complete observations.
 
-        2. The complexity of Spearman correlation is O(#row * #row), if the dataset is too
+        2. The complexity of Kendall correlation is O(#row * #row), if the dataset is too
            large, sampling ahead of correlation computation is recommended.
 
         Examples


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org