You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/09/16 08:21:02 UTC

[jira] [Resolved] (SPARK-22021) Add a feature transformation to accept a function and apply it on all rows of dataframe

     [ https://issues.apache.org/jira/browse/SPARK-22021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-22021.
-------------------------------
    Resolution: Won't Fix

Agree, just can't imagine supporting Javascript in just one corner of Spark

> Add a feature transformation to accept a function and apply it on all rows of dataframe
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-22021
>                 URL: https://issues.apache.org/jira/browse/SPARK-22021
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.3.0
>            Reporter: Hosur Narahari
>
> More often we generate derived features in ML pipeline by doing some mathematical or other kind of operation on columns of dataframe like getting a total of few columns as a new column or if there is text field message and we want the length of message etc. We currently don't have an efficient way to handle such scenario in ML pipeline.
> By Providing a transformer which accepts a function and performs that on mentioned columns to generate output column of numerical type, user has the flexibility to derive features by applying any domain specific logic.
> Example:
> val function = "function(a,b) { return a+b;}"
> val transformer = new GenFuncTransformer().setInputCols(Array("v1", "v2")).setOutputCol("result").setFunction(function)
> val df = Seq((1.0, 2.0), (3.0, 4.0)).toDF("v1", "v2")
> val result = transformer.transform(df)
> result.show
> v1   v2  result
> 1.0 2.0 3.0
> 3.0 4.0 7.0



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org