You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "ZygD (Jira)" <ji...@apache.org> on 2022/03/21 12:46:00 UTC
[jira] [Updated] (SPARK-38614) df.show(3) does not equal df.show() first rows
[ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ZygD updated SPARK-38614:
-------------------------
Description:
*Minimal reproducible example*
{code:java}
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5) {code}
*Expected result*
{code:java}
+---+----+
| id| pr|
+---+----+
| 0| 0.0|
| 1|0.01|
| 2|0.02|
+---+----+
only showing top 3 rows
+---+----+
| id| pr|
+---+----+
| 0| 0.0|
| 1|0.01|
| 2|0.02|
| 3|0.03|
| 4|0.04|
+---+----+
only showing top 5 rows{code}
*Actual result*
{code:java}
+---+------------------+
| id| pr|
+---+------------------+
| 0| 0.0|
| 1|0.3333333333333333|
| 2|0.6666666666666666|
+---+------------------+
only showing top 3 rows
+---+---+
| id| pr|
+---+---+
| 0|0.0|
| 1|0.2|
| 2|0.4|
| 3|0.6|
| 4|0.8|
+---+---+
only showing top 5 rows{code}
was:
*Minimal reproducible example*
```python
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5)
```
*Expected result*
```none
+---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| +---+----+ only showing top 3 rows +---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---+----+ only showing top 5 rows
```
*Actual result*
```none
+---+------------------+ | id| pr| +---+------------------+ | 0| 0.0| | 1|0.3333333333333333| | 2|0.6666666666666666| +---+------------------+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows
```
> df.show(3) does not equal df.show() first rows
> ----------------------------------------------
>
> Key: SPARK-38614
> URL: https://issues.apache.org/jira/browse/SPARK-38614
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 3.2.1
> Reporter: ZygD
> Priority: Major
>
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---+----+
> | id| pr|
> +---+----+
> | 0| 0.0|
> | 1|0.01|
> | 2|0.02|
> +---+----+
> only showing top 3 rows
> +---+----+
> | id| pr|
> +---+----+
> | 0| 0.0|
> | 1|0.01|
> | 2|0.02|
> | 3|0.03|
> | 4|0.04|
> +---+----+
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+------------------+
> | id| pr|
> +---+------------------+
> | 0| 0.0|
> | 1|0.3333333333333333|
> | 2|0.6666666666666666|
> +---+------------------+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> | 0|0.0|
> | 1|0.2|
> | 2|0.4|
> | 3|0.6|
> | 4|0.8|
> +---+---+
> only showing top 5 rows{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org