You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ru...@apache.org on 2022/11/10 07:42:48 UTC
[spark] branch master updated: [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab `

This is an automated email from the ASF dual-hosted git repository.

ruifengz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 40a9a6ef5b8 [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab `
40a9a6ef5b8 is described below

commit 40a9a6ef5b89f0c3d19db4a43b8a73decaa173c3
Author: Ruifeng Zheng <ru...@apache.org>
AuthorDate: Thu Nov 10 15:42:19 2022 +0800

    [SPARK-40877][DOC][FOLLOW-UP] Update the doc of `DataFrame.stat.crosstab `
    
    ### What changes were proposed in this pull request?
    remove the outdated comments
    
    ### Why are the changes needed?
    the limitations are not true after [reimplementation](https://github.com/apache/spark/pull/38340)
    
    ### Does this PR introduce _any_ user-facing change?
    yes
    
    ### How was this patch tested?
    doc - only
    
    Closes #38579 from zhengruifeng/doc_crosstab.
    
    Lead-authored-by: Ruifeng Zheng <ru...@apache.org>
    Co-authored-by: Ruifeng Zheng <ru...@foxmail.com>
    Signed-off-by: Ruifeng Zheng <ru...@apache.org>
---
 python/pyspark/sql/dataframe.py                                        | 3 +--
 .../src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala   | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/python/pyspark/sql/dataframe.py b/python/pyspark/sql/dataframe.py
index 3c787f8900f..6d5014918bf 100644
--- a/python/pyspark/sql/dataframe.py
+++ b/python/pyspark/sql/dataframe.py
@@ -4217,8 +4217,7 @@ class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
     def crosstab(self, col1: str, col2: str) -> "DataFrame":
         """
         Computes a pair-wise frequency table of the given columns. Also known as a contingency
-        table. The number of distinct values for each column should be less than 1e4. At most 1e6
-        non-zero pair frequencies will be returned.
+        table.
         The first column of each row will be the distinct values of `col1` and the column names
         will be the distinct values of `col2`. The name of the first column will be `$col1_$col2`.
         Pairs that have no occurrences will have zero as their counts.
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
index efd430633d7..7511c21fa76 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
@@ -181,8 +181,6 @@ final class DataFrameStatFunctions private[sql](df: DataFrame) {
 
   /**
    * Computes a pair-wise frequency table of the given columns. Also known as a contingency table.
-   * The number of distinct values for each column should be less than 1e4. At most 1e6 non-zero
-   * pair frequencies will be returned.
    * The first column of each row will be the distinct values of `col1` and the column names will
    * be the distinct values of `col2`. The name of the first column will be `col1_col2`. Counts
    * will be returned as `Long`s. Pairs that have no occurrences will have zero as their counts.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org