You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2016/10/08 13:23:20 UTC
[jira] [Commented] (SPARK-12916) Support Row.fromSeq and Row.toSeq
methods in pyspark
[ https://issues.apache.org/jira/browse/SPARK-12916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557964#comment-15557964 ]
Hyukjin Kwon commented on SPARK-12916:
--------------------------------------
I am pretty sure we don't need this but I would like to cc [~holdenk] here
> Support Row.fromSeq and Row.toSeq methods in pyspark
> ----------------------------------------------------
>
> Key: SPARK-12916
> URL: https://issues.apache.org/jira/browse/SPARK-12916
> Project: Spark
> Issue Type: Improvement
> Components: PySpark, SQL
> Reporter: Shubhanshu Mishra
> Priority: Minor
> Labels: dataframe, pyspark, row, sql
>
> Pyspark should also have access to the Row functions like fromSeq and toSeq which are exposed in the scala api.
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Row
> This will be useful when constructing custom columns from function called in dataframes. A good example is present in the following SO threat:
> http://stackoverflow.com/questions/32196207/derive-multiple-columns-from-a-single-column-in-a-spark-dataframe
> {code:python}
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.Row
> def foobarFunc(x: Long, y: Double, z: String): Seq[Any] =
> Seq(x * y, z.head.toInt * y)
> val schema = StructType(df.schema.fields ++
> Array(StructField("foo", DoubleType), StructField("bar", DoubleType)))
> val rows = df.rdd.map(r => Row.fromSeq(
> r.toSeq ++
> foobarFunc(r.getAs[Long]("x"), r.getAs[Double]("y"), r.getAs[String]("z"))))
> val df2 = sqlContext.createDataFrame(rows, schema)
> df2.show
> // +---+----+---+----+-----+
> // | x| y| z| foo| bar|
> // +---+----+---+----+-----+
> // | 1| 3.0| a| 3.0|291.0|
> // | 2|-1.0| b|-2.0|-98.0|
> // | 3| 0.0| c| 0.0| 0.0|
> // +---+----+---+----+-----+
> {code}
> I am ready to work on this feature.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org