You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by KhajaAsmath Mohammed <md...@gmail.com> on 2017/04/27 15:27:27 UTC
Data Skew in Dataframe Groupby - Any suggestions?
Hi,
I am working on requirement where I need to perform groupby on set of data
and find the max value on that group.
GroupBy on dataframe is resulting in skewness and job is running for quite
a long time (actually more time than in Hive and Impala for one day worth
of data).
Any suggestions on how to overcome this?
dataframe.groupBy(Constants.Datapoint.Vin,Constants.Datapoint.Utctime,Constants.Datapoint.ProviderDesc,Constants.Datapoint.Latitude,Constants.Datapoint.Longitude)
*Note: *I have added colleace and persited data into memory and disk too
still no improvement
Thanks,
Asmath.