You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by SNEHASISH DUTTA <in...@gmail.com> on 2018/04/13 16:26:07 UTC

Shuffling Data After Union and Write

Hi,

I am currently facing an issue , while performing union on three data fames
say df1,df2,df3 once the operation is performed and I am trying to save the
data , the data is getting shuffled so the ordering of data in df1,df2,df3
are not maintained.

When I save the data as text/csv file the content of the data gets shuffled
within.
There is no way to order the dataframe as these 3 dataframes don't share
any common field/constraint.

Let me know if there is a work around to maintain the ordering of the
dataframes after union and write.

Regards,
Snehasish

Re: Shuffling Data After Union and Write

Posted by Rahul Nandi <ra...@gmail.com>.
You can put a new column say order to each of the DF having 1, 2 and 3 for
df1, df2 and df3 respectively. Then you can sort the data based on the
order.

On Fri 13 Apr, 2018, 21:56 SNEHASISH DUTTA, <in...@gmail.com>
wrote:

> Hi,
>
> I am currently facing an issue , while performing union on three data
> fames say df1,df2,df3 once the operation is performed and I am trying to
> save the data , the data is getting shuffled so the ordering of data in
> df1,df2,df3 are not maintained.
>
> When I save the data as text/csv file the content of the data gets
> shuffled within.
> There is no way to order the dataframe as these 3 dataframes don't share
> any common field/constraint.
>
> Let me know if there is a work around to maintain the ordering of the
> dataframes after union and write.
>
> Regards,
> Snehasish
>