You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by SNEHASISH DUTTA <in...@gmail.com> on 2018/04/13 16:26:07 UTC
Shuffling Data After Union and Write
Hi,
I am currently facing an issue , while performing union on three data fames
say df1,df2,df3 once the operation is performed and I am trying to save the
data , the data is getting shuffled so the ordering of data in df1,df2,df3
are not maintained.
When I save the data as text/csv file the content of the data gets shuffled
within.
There is no way to order the dataframe as these 3 dataframes don't share
any common field/constraint.
Let me know if there is a work around to maintain the ordering of the
dataframes after union and write.
Regards,
Snehasish
Re: Shuffling Data After Union and Write
Posted by Rahul Nandi <ra...@gmail.com>.
You can put a new column say order to each of the DF having 1, 2 and 3 for
df1, df2 and df3 respectively. Then you can sort the data based on the
order.
On Fri 13 Apr, 2018, 21:56 SNEHASISH DUTTA, <in...@gmail.com>
wrote:
> Hi,
>
> I am currently facing an issue , while performing union on three data
> fames say df1,df2,df3 once the operation is performed and I am trying to
> save the data , the data is getting shuffled so the ordering of data in
> df1,df2,df3 are not maintained.
>
> When I save the data as text/csv file the content of the data gets
> shuffled within.
> There is no way to order the dataframe as these 3 dataframes don't share
> any common field/constraint.
>
> Let me know if there is a work around to maintain the ordering of the
> dataframes after union and write.
>
> Regards,
> Snehasish
>