You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/08/05 07:04:00 UTC

[jira] [Resolved] (SPARK-32515) Distinct Function Weird Bug

     [ https://issues.apache.org/jira/browse/SPARK-32515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-32515.
----------------------------------
    Resolution: Not A Problem

> Distinct Function Weird Bug
> ---------------------------
>
>                 Key: SPARK-32515
>                 URL: https://issues.apache.org/jira/browse/SPARK-32515
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.4.6
>         Environment: Window 10 and Mac, both have the same issues.
> Using Scala version 2.11.12
> Python 3.6.10
> java version "1.8.0_261"
>            Reporter: Jayce Jiang
>            Priority: Major
>         Attachments: Capture.PNG, Capture1.png, Capture2.PNG, image-2020-08-03-07-03-55-716.png, unknown.png, unknown1.png, unknown2.png
>
>
> A weird spark display and counting error. When I was loading in my CSV file into spark and trying to do check all distinct value from a column inside of a dataframe. Everything I try in spark resulted in a wrong answer. But if I convert my spark dataframe into pandas dataframe, it works. Please help. This bug only happens in this one CSV file, all my other CSV files work properly. Here are the pictures.
>  
> !image-2020-08-01-21-19-06-402.png!!image-2020-08-01-21-19-03-289.png!!image-2020-08-01-21-18-58-625.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org