You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yeikel <em...@yeikel.com> on 2019/03/05 03:13:02 UTC

Difference between One map vs multiple maps

Considering that I have a Dataframe df , I could run
df.map(operation1).map(operation2) or run df.map(logic for both operations). 
In addition , I could also run df.map(operation3) where operation3 would be
:

return operation2(operation1())


Similarly , with UDFs, I could build a UDF that does two things or two
different ones and call them sequentially. 

Is there any performance differences (like casting back and forth from
Tungsten?) between the two? Or should I be more focused about separation of
concerns than performance for this case?

Thank you.



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org