You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yana Kadiyska <ya...@gmail.com> on 2015/07/23 20:35:45 UTC
Help with Dataframe syntax ( IN / COLLECT_SET)
Hi folks, having trouble expressing IN and COLLECT_SET on a dataframe. In
other words, I'd like to figure out how to write the following query:
"select collect_set(b),a from mytable where c in (1,2,3) group by a"
I've started with
someDF
.where( -- not sure what do for c here---
.groupBy($"a")
.agg(-- collect_set is not part of sql functions as far as I see...--)
I know I can register a table and do raw sql but I'm trying to figure out
the DF route...
Help much appreciated.