You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Advait Mohan Raut <ad...@essexlg.com> on 2017/02/23 10:16:27 UTC

Scala functions for dataframes

Hi Team,

I am using Scala Spark Dataframes for data operations over CSV files.

There is a common transformation code being used by multiple process flows.
Hence I wish to create a Scala functions for that [with def fn_name()].
All process flows will use the functionality implemented inside these Scala functions.


Typical transformations on the data are like the following:

  1.  Modify multiple columns
  2.  Changing a column conditioned on one or more columns
  3.  Date time format manipulations
  4.  Applying regex over one or more columns.

For such transformations:


  1.  What is the best way to perform these operations ?
  2.  Can we do such operations without sql queries on dataframes ?
  3.  If there is no choice other than running sql queries then what is the best way to write generic scala functions for that ?
  4.  Also if we have a consideration like all input dataframes have different schema but have the constant column names which we need to process. What should be the preferred choice in this case ?

Please let me know if you need more clarification on this.




Regards
Advait





The information transmitted herewith is sensitive information intended only for use to the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon, this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

WARNING: E-mail communications cannot be guaranteed to be timely, secure, error-free or virus-free. The recipient of this communication should check this e-mail and each attachment for the presence of viruses. The sender does not accept any liability for any errors or omissions in the content of this electronic communication which arises as a result of e-mail transmission.