You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2018/01/15 11:36:00 UTC
[jira] [Updated] (SPARK-23074) Dataframe-ified zipwithindex
[ https://issues.apache.org/jira/browse/SPARK-23074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-23074:
------------------------------
Affects Version/s: (was: 2.4.0)
Priority: Minor (was: Major)
You can create a DataFrame from the result of .zipWithIndex on an RDD, as you see here.
There's already a rowNumber function in Spark SQL, however, which sounds like the native equivalent?
> Dataframe-ified zipwithindex
> ----------------------------
>
> Key: SPARK-23074
> URL: https://issues.apache.org/jira/browse/SPARK-23074
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 2.3.0
> Reporter: Ruslan Dautkhanov
> Priority: Minor
> Labels: dataframe, rdd
>
> Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
> {code:java}
> import org.apache.spark.sql.DataFrame
> import org.apache.spark.sql.types.{LongType, StructField, StructType}
> import org.apache.spark.sql.Row
> def dfZipWithIndex(
> df: DataFrame,
> offset: Int = 1,
> colName: String = "id",
> inFront: Boolean = true
> ) : DataFrame = {
> df.sqlContext.createDataFrame(
> df.rdd.zipWithIndex.map(ln =>
> Row.fromSeq(
> (if (inFront) Seq(ln._2 + offset) else Seq())
> ++ ln._1.toSeq ++
> (if (inFront) Seq() else Seq(ln._2 + offset))
> )
> ),
> StructType(
> (if (inFront) Array(StructField(colName,LongType,false)) else Array[StructField]())
> ++ df.schema.fields ++
> (if (inFront) Array[StructField]() else Array(StructField(colName,LongType,false)))
> )
> )
> }
> {code}
> credits: [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org