You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2016/10/08 13:23:20 UTC

[jira] [Commented] (SPARK-12916) Support Row.fromSeq and Row.toSeq methods in pyspark

    [ https://issues.apache.org/jira/browse/SPARK-12916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15557964#comment-15557964 ] 

Hyukjin Kwon commented on SPARK-12916:
--------------------------------------

I am pretty sure we don't need this but I would like to cc [~holdenk] here 

> Support Row.fromSeq and Row.toSeq methods in pyspark
> ----------------------------------------------------
>
>                 Key: SPARK-12916
>                 URL: https://issues.apache.org/jira/browse/SPARK-12916
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark, SQL
>            Reporter: Shubhanshu Mishra
>            Priority: Minor
>              Labels: dataframe, pyspark, row, sql
>
> Pyspark should also have access to the Row functions like fromSeq and toSeq which are exposed in the scala api. 
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Row
> This will be useful when constructing custom columns from function called in dataframes. A good example is present in the following SO threat: 
> http://stackoverflow.com/questions/32196207/derive-multiple-columns-from-a-single-column-in-a-spark-dataframe
> {code:python}
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.Row
> def foobarFunc(x: Long, y: Double, z: String): Seq[Any] = 
>   Seq(x * y, z.head.toInt * y)
> val schema = StructType(df.schema.fields ++
>   Array(StructField("foo", DoubleType), StructField("bar", DoubleType)))
> val rows = df.rdd.map(r => Row.fromSeq(
>   r.toSeq ++
>   foobarFunc(r.getAs[Long]("x"), r.getAs[Double]("y"), r.getAs[String]("z"))))
> val df2 = sqlContext.createDataFrame(rows, schema)
> df2.show
> // +---+----+---+----+-----+
> // |  x|   y|  z| foo|  bar|
> // +---+----+---+----+-----+
> // |  1| 3.0|  a| 3.0|291.0|
> // |  2|-1.0|  b|-2.0|-98.0|
> // |  3| 0.0|  c| 0.0|  0.0|
> // +---+----+---+----+-----+
> {code}
> I am ready to work on this feature. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org