You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "ZygD (Jira)" <ji...@apache.org> on 2022/04/01 15:24:00 UTC

[jira] [Updated] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results

     [ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ZygD updated SPARK-38614:
-------------------------
    Component/s: SQL

> After Spark update, df.show() shows incorrect F.percent_rank results
> --------------------------------------------------------------------
>
>                 Key: SPARK-38614
>                 URL: https://issues.apache.org/jira/browse/SPARK-38614
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 3.2.0, 3.2.1
>            Reporter: ZygD
>            Priority: Major
>              Labels: correctness
>
> Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---+----+
> | id|  pr|
> +---+----+
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---+----+
> only showing top 3 rows
> +---+----+
> | id|  pr|
> +---+----+
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---+----+
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+------------------+
> | id|                pr|
> +---+------------------+
> |  0|               0.0|
> |  1|0.3333333333333333|
> |  2|0.6666666666666666|
> +---+------------------+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org