You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Jorge Machado <jo...@me.com> on 2018/04/19 06:38:01 UTC

Dataframe Defragmentation

Hi Guys, 
in the dataworkssummit 2018 in Berlin BMW had an intessenting use case. They gather data in a special format (Custom InputFormat). Then they map a "reduceByKey" but they don't reduce anything they just try to find out the missing pieces without shuffle. 
I think this could be actually include in the dataframe as a function like defragmentation of a dataframe, where the defragmentation happens based on a function.
Is this actually an Idea? 


Jorge