You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yana Kadiyska <ya...@gmail.com> on 2015/07/16 15:58:10 UTC

PairRDDFunctions and DataFrames

Hi, could someone point me to the recommended way of using
countApproxDistinctByKey with DataFrames?

I know I can map to pair RDD but I'm wondering if there is a simpler
method? If someone knows if this operations is expressible in SQL that
information would be most appreciated as well.

Re: PairRDDFunctions and DataFrames

Posted by Michael Armbrust <mi...@databricks.com>.
Instead of using that RDD operation just use the native DataFrame function
approxCountDistinct

https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.functions$

On Thu, Jul 16, 2015 at 6:58 AM, Yana Kadiyska <ya...@gmail.com>
wrote:

> Hi, could someone point me to the recommended way of using
> countApproxDistinctByKey with DataFrames?
>
> I know I can map to pair RDD but I'm wondering if there is a simpler
> method? If someone knows if this operations is expressible in SQL that
> information would be most appreciated as well.
>