You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jorge Machado <jo...@me.com> on 2018/04/19 06:38:01 UTC
Dataframe Defragmentation
Hi Guys,
in the dataworkssummit 2018 in Berlin BMW had an intessenting use case. They gather data in a special format (Custom InputFormat). Then they map a "reduceByKey" but they don't reduce anything they just try to find out the missing pieces without shuffle.
I think this could be actually include in the dataframe as a function like defragmentation of a dataframe, where the defragmentation happens based on a function.
Is this actually an Idea?
Jorge