You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shubhanshu Mishra (JIRA)" <ji...@apache.org> on 2016/01/20 05:33:39 UTC
[jira] [Created] (SPARK-12916) Support Row.fromSeq and Row.toSeq
methods in pyspark
Shubhanshu Mishra created SPARK-12916:
-----------------------------------------
Summary: Support Row.fromSeq and Row.toSeq methods in pyspark
Key: SPARK-12916
URL: https://issues.apache.org/jira/browse/SPARK-12916
Project: Spark
Issue Type: Improvement
Components: PySpark, SQL
Reporter: Shubhanshu Mishra
Priority: Minor
Pyspark should also have access to the Row functions like fromSeq and toSeq which are exposed in the scala api.
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Row
This will be useful when constructing custom columns from function called in dataframes. A good example is present in the following SO threat:
http://stackoverflow.com/questions/32196207/derive-multiple-columns-from-a-single-column-in-a-spark-dataframe
{code:python}
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
def foobarFunc(x: Long, y: Double, z: String): Seq[Any] =
Seq(x * y, z.head.toInt * y)
val schema = StructType(df.schema.fields ++
Array(StructField("foo", DoubleType), StructField("bar", DoubleType)))
val rows = df.rdd.map(r => Row.fromSeq(
r.toSeq ++
foobarFunc(r.getAs[Long]("x"), r.getAs[Double]("y"), r.getAs[String]("z"))))
val df2 = sqlContext.createDataFrame(rows, schema)
df2.show
// +---+----+---+----+-----+
// | x| y| z| foo| bar|
// +---+----+---+----+-----+
// | 1| 3.0| a| 3.0|291.0|
// | 2|-1.0| b|-2.0|-98.0|
// | 3| 0.0| c| 0.0| 0.0|
// +---+----+---+----+-----+
{code}
I am ready to work on this feature.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org