You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hanan Shteingart (JIRA)" <ji...@apache.org> on 2018/12/26 20:40:00 UTC
[jira] [Created] (SPARK-26449) Dataframe.transform
Hanan Shteingart created SPARK-26449:
----------------------------------------
Summary: Dataframe.transform
Key: SPARK-26449
URL: https://issues.apache.org/jira/browse/SPARK-26449
Project: Spark
Issue Type: New Feature
Components: SQL
Affects Versions: 2.4.0
Reporter: Hanan Shteingart
I would like to chain custom transformations as is suggested in this [blog post|https://medium.com/@mrpowers/chaining-custom-pyspark-transformations-4f38a8c7ae55]
This will allow to write something like the following:
{code:java}
def with_greeting(df):
return df.withColumn("greeting", lit("hi"))
def with_something(df, something):
return df.withColumn("something", lit(something))
data = [("jose", 1), ("li", 2), ("liz", 3)]
source_df = spark.createDataFrame(data, ["name", "age"])
actual_df = (source_df
.transform(with_greeting)
.transform(lambda df: with_something(df, "crazy")))
print(actual_df.show())
+----+---+--------+---------+
|name|age|greeting|something|
+----+---+--------+---------+
|jose| 1| hi| crazy|
| li| 2| hi| crazy|
| liz| 3| hi| crazy|
+----+---+--------+---------+
{code}
The only thing needed to accomplish this is the following simple method for DataFrame:
{code:java}
from pyspark.sql.dataframe import DataFrame
def transform(self, f):
return f(self)
DataFrame.transform = transform
{code}
I volunteer to do the pull request if approved (at least the python part)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org