You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yana Kadiyska <ya...@gmail.com> on 2015/07/23 20:35:45 UTC

Help with Dataframe syntax ( IN / COLLECT_SET)

Hi folks, having trouble expressing IN and COLLECT_SET on a dataframe. In
other words, I'd like to figure out how to write the following query:

"select collect_set(b),a from mytable where c in (1,2,3) group by a"

I've started with

  someDF
  .where( -- not sure what do for c here---
  .groupBy($"a")
  .agg(-- collect_set is not part of sql functions as far as I see...--)

​

I know I can register a table and do raw sql but I'm trying to figure out
the DF route...

Help much appreciated.