You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Rishi Shah <ri...@gmail.com> on 2019/06/12 01:33:57 UTC

[pyspark 2.3+] count distinct returns different value every time it is run on the same dataset

Hi All,

countDistinct on dataframe returns different results every time it is run,
I expect that when approxCountDistinct is used but even for
countDistinct()? Is there a way to get accurate count using pyspark
(deterministic result)?

-- 
Regards,

Rishi Shah