You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Kumar sp <kr...@gmail.com> on 2019/02/18 14:34:17 UTC

Avoiding MUltiple GroupBy

Can we avoid multiple group by , l have a million records and its a
performance concern.

Below is my query , even with Windows functions also i guess it is a
performance hit, can you please advice if there is a better alternative.
I need to get max no of equipments for that house for list of dates

 ds.groupBy("house", "date").agg(countDistinct("equiId") as "count").
      drop("date").groupBy("house").agg(max("count") as "noOfEquipments")

Regards,
Kumar