You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "ZygD (Jira)" <ji...@apache.org> on 2022/04/01 15:24:00 UTC
[jira] [Updated] (SPARK-38614) After Spark update, df.show() shows incorrect F.percent_rank results
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ZygD updated SPARK-38614:
-------------------------
Component/s: SQL
> After Spark update, df.show() shows incorrect F.percent_rank results
> --------------------------------------------------------------------
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
> Issue Type: Bug
> Components: PySpark, SQL
> Affects Versions: 3.2.0, 3.2.1
> Reporter: ZygD
> Priority: Major
> Labels: correctness
>
> Expected result is obtained using Spark 3.1.2, but not 3.2.0 or 3.2.1
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---+----+
> | id| pr|
> +---+----+
> | 0| 0.0|
> | 1|0.01|
> | 2|0.02|
> +---+----+
> only showing top 3 rows
> +---+----+
> | id| pr|
> +---+----+
> | 0| 0.0|
> | 1|0.01|
> | 2|0.02|
> | 3|0.03|
> | 4|0.04|
> +---+----+
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+------------------+
> | id| pr|
> +---+------------------+
> | 0| 0.0|
> | 1|0.3333333333333333|
> | 2|0.6666666666666666|
> +---+------------------+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> | 0|0.0|
> | 1|0.2|
> | 2|0.4|
> | 3|0.6|
> | 4|0.8|
> +---+---+
> only showing top 5 rows{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org