You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2020/02/10 02:46:00 UTC
[jira] [Commented] (SPARK-26449) Missing Dataframe.transform API in
Python API
[ https://issues.apache.org/jira/browse/SPARK-26449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17033343#comment-17033343 ]
Hyukjin Kwon commented on SPARK-26449:
--------------------------------------
To match with Scala side. It should be easy to work around.
> Missing Dataframe.transform API in Python API
> ---------------------------------------------
>
> Key: SPARK-26449
> URL: https://issues.apache.org/jira/browse/SPARK-26449
> Project: Spark
> Issue Type: Improvement
> Components: PySpark, SQL
> Affects Versions: 2.4.0
> Reporter: Hanan Shteingart
> Assignee: Erik Christiansen
> Priority: Minor
> Fix For: 3.0.0
>
> Original Estimate: 24h
> Remaining Estimate: 24h
>
> I would like to chain custom transformations as is suggested in this [blog post|https://medium.com/@mrpowers/chaining-custom-pyspark-transformations-4f38a8c7ae55]
> This will allow to write something like the following:
>
>
> {code:java}
>
> def with_greeting(df):
> return df.withColumn("greeting", lit("hi"))
> def with_something(df, something):
> return df.withColumn("something", lit(something))
> data = [("jose", 1), ("li", 2), ("liz", 3)]
> source_df = spark.createDataFrame(data, ["name", "age"])
> actual_df = (source_df
> .transform(with_greeting)
> .transform(lambda df: with_something(df, "crazy")))
> print(actual_df.show())
> +----+---+--------+---------+
> |name|age|greeting|something|
> +----+---+--------+---------+
> |jose| 1| hi| crazy|
> | li| 2| hi| crazy|
> | liz| 3| hi| crazy|
> +----+---+--------+---------+
> {code}
> The only thing needed to accomplish this is the following simple method for DataFrame:
> {code:java}
> from pyspark.sql.dataframe import DataFrame
> def transform(self, f):
> return f(self)
> DataFrame.transform = transform
> {code}
> I volunteer to do the pull request if approved (at least the python part)
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org