You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jestin Ma <je...@gmail.com> on 2016/06/29 13:32:37 UTC

Can Spark Dataframes preserve order when joining?

If it’s not too much trouble, could I get some pointers/help on this? (see link)
http://stackoverflow.com/questions/38085801/can-dataframe-joins-in-spark-preserve-order <http://stackoverflow.com/questions/38085801/can-dataframe-joins-in-spark-preserve-order>

-also, as a side question, do Dataframes support easy reordering of columns?

Thank you!
Jestin

Re: Can Spark Dataframes preserve order when joining?

Posted by Takeshi Yamamuro <li...@gmail.com>.
Hi,

Most of join strategies do not preserve the orderings of input dfs
(sort-merge joins
only hold the ordering of a left input df).
So, as said earlier, you need to explicitly sort them if you want ordered
outputs.

// maropu

On Wed, Jun 29, 2016 at 3:38 PM, Mich Talebzadeh <mi...@gmail.com>
wrote:

> Hi,
>
> Well I would not assume anything myself. If you want to order it do it
> explicitly.
>
> Let us take a simple case by creating three DFs based on existing tables
>
> val s =
> HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID")
> val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
> val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")
>
> now let us join these tables
>
> val rs =
> s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales"))
>
> And do ab order explicitly
>
> val rs1 = rs.*orderBy*
> ("calendar_month_desc","channel_desc").take(5).foreach(println)
>
>
> HTH
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 29 June 2016 at 14:32, Jestin Ma <je...@gmail.com> wrote:
>
>> If it’s not too much trouble, could I get some pointers/help on this?
>> (see link)
>>
>> http://stackoverflow.com/questions/38085801/can-dataframe-joins-in-spark-preserve-order
>>
>> -also, as a side question, do Dataframes support easy reordering of
>> columns?
>>
>> Thank you!
>> Jestin
>>
>
>


-- 
---
Takeshi Yamamuro

Re: Can Spark Dataframes preserve order when joining?

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi,

Well I would not assume anything myself. If you want to order it do it
explicitly.

Let us take a simple case by creating three DFs based on existing tables

val s =
HiveContext.table("sales").select("AMOUNT_SOLD","TIME_ID","CHANNEL_ID")
val c = HiveContext.table("channels").select("CHANNEL_ID","CHANNEL_DESC")
val t = HiveContext.table("times").select("TIME_ID","CALENDAR_MONTH_DESC")

now let us join these tables

val rs =
s.join(t,"time_id").join(c,"channel_id").groupBy("calendar_month_desc","channel_desc").agg(sum("amount_sold").as("TotalSales"))

And do ab order explicitly

val rs1 = rs.*orderBy*
("calendar_month_desc","channel_desc").take(5).foreach(println)


HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 29 June 2016 at 14:32, Jestin Ma <je...@gmail.com> wrote:

> If it’s not too much trouble, could I get some pointers/help on this? (see
> link)
>
> http://stackoverflow.com/questions/38085801/can-dataframe-joins-in-spark-preserve-order
>
> -also, as a side question, do Dataframes support easy reordering of
> columns?
>
> Thank you!
> Jestin
>