You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by anup ahire <ah...@gmail.com> on 2017/03/15 06:04:57 UTC

apply UDFs to N columns dynamically in dataframe

Hello,

I have a schema and name of columns to apply UDF to. Name of columns are
user input and they can differ in numbers for each input.

Is there a way to apply UDFs to N columns in dataframe  ?


Thanks !

Re: apply UDFs to N columns dynamically in dataframe

Posted by Hongdi Ren <ry...@gmail.com>.
Since N is decided at runtime, the first idea come to my mind is transform the columns into one vector column (VectorIndexer can do that) and then let udf handle the vector. Just like many ml transformers do.

 

From: anup ahire <ah...@gmail.com>
Date: Wednesday, March 15, 2017 at 2:04 PM
To: <us...@spark.apache.org>
Subject: apply UDFs to N columns dynamically in dataframe

 

Hello,

 

I have a schema and name of columns to apply UDF to. Name of columns are user input and they can differ in numbers for each input.

 

Is there a way to apply UDFs to N columns in dataframe  ?

 

 

Thanks !


Re: apply UDFs to N columns dynamically in dataframe

Posted by Yong Zhang <ja...@hotmail.com>.
Is the answer here good for your case?


http://stackoverflow.com/questions/33151866/spark-udf-with-varargs

[https://cdn.sstatic.net/Sites/stackoverflow/img/apple-touch-icon@2.png?v=73d79a89bded]<http://stackoverflow.com/questions/33151866/spark-udf-with-varargs>

scala - Spark UDF with varargs - Stack Overflow<http://stackoverflow.com/questions/33151866/spark-udf-with-varargs>
stackoverflow.com
UDFs don't support varargs* but you can pass an arbitrary number of columns wrapped using an array function: import org.apache.spark.sql.functions.{udf, array, lit ...





________________________________
From: anup ahire <ah...@gmail.com>
Sent: Wednesday, March 15, 2017 2:04 AM
To: user@spark.apache.org
Subject: apply UDFs to N columns dynamically in dataframe

Hello,

I have a schema and name of columns to apply UDF to. Name of columns are user input and they can differ in numbers for each input.

Is there a way to apply UDFs to N columns in dataframe  ?



Thanks !