You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "JinxinTang (Jira)" <ji...@apache.org> on 2020/08/02 03:30:00 UTC

[jira] [Commented] (SPARK-32515) Distinct Function Weird Bug

    [ https://issues.apache.org/jira/browse/SPARK-32515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169444#comment-17169444 ] 

JinxinTang commented on SPARK-32515:
------------------------------------

Hi [~tigaiii123] ,

Could you please provide reproduce code and the pictures attached seem broken.

> Distinct Function Weird Bug
> ---------------------------
>
>                 Key: SPARK-32515
>                 URL: https://issues.apache.org/jira/browse/SPARK-32515
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.6
>         Environment: Window 10 and Mac, both have the same issues.
> Using Scala version 2.11.12
> Python 3.6.10
> java version "1.8.0_261"
>            Reporter: Jayce Jiang
>            Priority: Blocker
>              Labels: distinct, groupby, load, read
>             Fix For: 2.4.6
>
>
> A weird spark display and counting error. When I was loading in my CSV file into spark and trying to do check all distinct value from a column inside of a dataframe. Everything I try in spark resulted in a wrong answer. But if I convert my spark dataframe into pandas dataframe, it works. Please help. This bug only happens in this one CSV file, all my other CSV files work properly. Here are the pictures.
>  
> !image-2020-08-01-21-19-06-402.png!!image-2020-08-01-21-19-03-289.png!!image-2020-08-01-21-18-58-625.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org