You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Rishi Shah <ri...@gmail.com> on 2019/06/12 01:33:57 UTC
[pyspark 2.3+] count distinct returns different value every time it
is run on the same dataset
Hi All,
countDistinct on dataframe returns different results every time it is run,
I expect that when approxCountDistinct is used but even for
countDistinct()? Is there a way to get accurate count using pyspark
(deterministic result)?
--
Regards,
Rishi Shah