You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Meeraj Kunnumpurath <me...@servicesymphony.com> on 2016/10/12 17:56:48 UTC

UDF on multiple columns

Hello,

How do I write a UDF that operate on two columns. For example, how do I
introduce a new column, which is a product of two columns already on the
dataframe.

Many thanks
Meeraj

Re: UDF on multiple columns

Posted by Meeraj Kunnumpurath <me...@servicesymphony.com>.
This is what I do at the moment,

def build(path: String, spark: SparkSession) = {
  val toDouble = udf((x: String) => x.toDouble)
  val df = spark.read.
    option("header", "true").
    csv(path).
    withColumn("sqft_living", toDouble('sqft_living)).
    withColumn("price", toDouble('price)).
    withColumn("bedrooms", toDouble('bedrooms)).
    withColumn("bathrooms", toDouble('bathrooms)).
    withColumn("lat", toDouble('lat)).
    withColumn("long", toDouble('long))
  df.createOrReplaceTempView("sales")
  spark.sql("select bedrooms * bedrooms, bedrooms * bathrooms, lat +
long, log(sqft_living), price from sales")
}


On Wed, Oct 12, 2016 at 9:56 PM, Meeraj Kunnumpurath <
meeraj@servicesymphony.com> wrote:

> Hello,
>
> How do I write a UDF that operate on two columns. For example, how do I
> introduce a new column, which is a product of two columns already on the
> dataframe.
>
> Many thanks
> Meeraj
>



-- 
*Meeraj Kunnumpurath*


*Director and Executive PrincipalService Symphony Ltd00 44 7702 693597*

*00 971 50 409 0169meeraj@servicesymphony.com <me...@servicesymphony.com>*