You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Divya Gehlot <di...@gmail.com> on 2016/02/04 10:28:57 UTC

add new column in the schema + Dataframe

Hi,
I am beginner in spark and using Spark 1.5.2 on YARN.(HDP2.3.4)
I have a use case where I have to read two input files and based on certain
 conditions in second input file ,have to add a new column in the first
input file and save it .

I am using spark-csv to read my input files .
Would really appreciate if somebody would share their thoughts on
best/feasible way of doing it(using dataframe API)


Thanks,
Divya

RE: add new column in the schema + Dataframe

Posted by Mohammed Guller <mo...@glassbeam.com>.

Hi Divya,
You can use the withColumn method from the DataFrame API. Here is the method signature:

def withColumn(colName: String, col: Column<http://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Column.html>): DataFrame


Mohammed
Author: Big Data Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Divya Gehlot [mailto:divya.htconex@gmail.com]
Sent: Thursday, February 4, 2016 1:29 AM
To: user @spark
Subject: add new column in the schema + Dataframe

Hi,
I am beginner in spark and using Spark 1.5.2 on YARN.(HDP2.3.4)
I have a use case where I have to read two input files and based on certain  conditions in second input file ,have to add a new column in the first input file and save it .

I am using spark-csv to read my input files .
Would really appreciate if somebody would share their thoughts on best/feasible way of doing it(using dataframe API)


Thanks,
Divya