You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Meeraj Kunnumpurath <me...@servicesymphony.com> on 2016/10/12 17:56:48 UTC
UDF on multiple columns
Hello,
How do I write a UDF that operate on two columns. For example, how do I
introduce a new column, which is a product of two columns already on the
dataframe.
Many thanks
Meeraj
Re: UDF on multiple columns
Posted by Meeraj Kunnumpurath <me...@servicesymphony.com>.
This is what I do at the moment,
def build(path: String, spark: SparkSession) = {
val toDouble = udf((x: String) => x.toDouble)
val df = spark.read.
option("header", "true").
csv(path).
withColumn("sqft_living", toDouble('sqft_living)).
withColumn("price", toDouble('price)).
withColumn("bedrooms", toDouble('bedrooms)).
withColumn("bathrooms", toDouble('bathrooms)).
withColumn("lat", toDouble('lat)).
withColumn("long", toDouble('long))
df.createOrReplaceTempView("sales")
spark.sql("select bedrooms * bedrooms, bedrooms * bathrooms, lat +
long, log(sqft_living), price from sales")
}
On Wed, Oct 12, 2016 at 9:56 PM, Meeraj Kunnumpurath <
meeraj@servicesymphony.com> wrote:
> Hello,
>
> How do I write a UDF that operate on two columns. For example, how do I
> introduce a new column, which is a product of two columns already on the
> dataframe.
>
> Many thanks
> Meeraj
>
--
*Meeraj Kunnumpurath*
*Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*
*00 971 50 409 0169meeraj@servicesymphony.com <me...@servicesymphony.com>*