You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "ZygD (Jira)" <ji...@apache.org> on 2022/03/21 12:46:00 UTC

[jira] [Updated] (SPARK-38614) df.show(3) does not equal df.show() first rows

     [ https://issues.apache.org/jira/browse/SPARK-38614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ZygD updated SPARK-38614:
-------------------------
    Description: 
*Minimal reproducible example*
{code:java}
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5) {code}
*Expected result*
{code:java}
+---+----+
| id|  pr|
+---+----+
|  0| 0.0|
|  1|0.01|
|  2|0.02|
+---+----+
only showing top 3 rows

+---+----+
| id|  pr|
+---+----+
|  0| 0.0|
|  1|0.01|
|  2|0.02|
|  3|0.03|
|  4|0.04|
+---+----+
only showing top 5 rows{code}
*Actual result*
{code:java}
+---+------------------+
| id|                pr|
+---+------------------+
|  0|               0.0|
|  1|0.3333333333333333|
|  2|0.6666666666666666|
+---+------------------+
only showing top 3 rows

+---+---+
| id| pr|
+---+---+
|  0|0.0|
|  1|0.2|
|  2|0.4|
|  3|0.6|
|  4|0.8|
+---+---+
only showing top 5 rows{code}

  was:
*Minimal reproducible example*

```python
from pyspark.sql import SparkSession, functions as F, Window as W
spark = SparkSession.builder.getOrCreate()
 
df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
df.show(3)
df.show(5)
```

*Expected result*

```none

+---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| +---+----+ only showing top 3 rows +---+----+ | id| pr| +---+----+ | 0| 0.0| | 1|0.01| | 2|0.02| | 3|0.03| | 4|0.04| +---+----+ only showing top 5 rows

```

*Actual result*

```none

+---+------------------+ | id| pr| +---+------------------+ | 0| 0.0| | 1|0.3333333333333333| | 2|0.6666666666666666| +---+------------------+ only showing top 3 rows +---+---+ | id| pr| +---+---+ | 0|0.0| | 1|0.2| | 2|0.4| | 3|0.6| | 4|0.8| +---+---+ only showing top 5 rows

```


> df.show(3) does not equal df.show() first rows
> ----------------------------------------------
>
>                 Key: SPARK-38614
>                 URL: https://issues.apache.org/jira/browse/SPARK-38614
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 3.2.1
>            Reporter: ZygD
>            Priority: Major
>
> *Minimal reproducible example*
> {code:java}
> from pyspark.sql import SparkSession, functions as F, Window as W
> spark = SparkSession.builder.getOrCreate()
>  
> df = spark.range(101).withColumn('pr', F.percent_rank().over(W.orderBy('id')))
> df.show(3)
> df.show(5) {code}
> *Expected result*
> {code:java}
> +---+----+
> | id|  pr|
> +---+----+
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> +---+----+
> only showing top 3 rows
> +---+----+
> | id|  pr|
> +---+----+
> |  0| 0.0|
> |  1|0.01|
> |  2|0.02|
> |  3|0.03|
> |  4|0.04|
> +---+----+
> only showing top 5 rows{code}
> *Actual result*
> {code:java}
> +---+------------------+
> | id|                pr|
> +---+------------------+
> |  0|               0.0|
> |  1|0.3333333333333333|
> |  2|0.6666666666666666|
> +---+------------------+
> only showing top 3 rows
> +---+---+
> | id| pr|
> +---+---+
> |  0|0.0|
> |  1|0.2|
> |  2|0.4|
> |  3|0.6|
> |  4|0.8|
> +---+---+
> only showing top 5 rows{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org