You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jayce Jiang (Jira)" <ji...@apache.org> on 2020/08/02 01:21:00 UTC

[jira] [Created] (SPARK-32515) Distinct Function Weird Bug

Jayce Jiang created SPARK-32515:
-----------------------------------

             Summary: Distinct Function Weird Bug
                 Key: SPARK-32515
                 URL: https://issues.apache.org/jira/browse/SPARK-32515
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.4.6
         Environment: Window 10 and Mac, both have the same issues.

Using Scala version 2.11.12

Python 3.6.10

java version "1.8.0_261"
            Reporter: Jayce Jiang
             Fix For: 2.4.6


A weird spark display and counting error. When I was loading in my CSV file into spark and trying to do check all distinct value from a column inside of a dataframe. Everything I try in spark resulted in a wrong answer. But if I convert my spark dataframe into pandas dataframe, it works. Please help. This bug only happens in this one CSV file, all my other CSV files work properly. Here are the pictures.

 

!image-2020-08-01-21-19-06-402.png!!image-2020-08-01-21-19-03-289.png!!image-2020-08-01-21-18-58-625.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org