You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "ZygD (Jira)" <ji...@apache.org> on 2022/03/21 12:30:00 UTC

[jira] [Created] (SPARK-38614) df.show(3) does not equal df.show() first rows

ZygD created SPARK-38614:
----------------------------

             Summary: df.show(3) does not equal df.show() first rows
                 Key: SPARK-38614
                 URL: https://issues.apache.org/jira/browse/SPARK-38614
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 3.2.1
            Reporter: ZygD


*Minimal reproducible example*

```python
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5)
```

*Expected result*

```none

+---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| +---+----+ only showing top 3 rows +---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---+----+ only showing top 5 rows

```

*Actual result*

```none

+---+------------------+ | id| pr| +---+------------------+ | 0| 0.0| | 1|0.3333333333333333| | 2|0.6666666666666666| +---+------------------+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows

```



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org