You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ruslan Dautkhanov (JIRA)" <ji...@apache.org> on 2018/01/15 06:26:00 UTC
[jira] [Created] (SPARK-23074) Dataframe-ified zipwithindex
Ruslan Dautkhanov created SPARK-23074:
-----------------------------------------
Summary: Dataframe-ified zipwithindex
Key: SPARK-23074
URL: https://issues.apache.org/jira/browse/SPARK-23074
Project: Spark
Issue Type: New Feature
Components: Spark Core
Affects Versions: 2.3.0, 2.4.0
Reporter: Ruslan Dautkhanov
Would be great to have a daraframe-friendly equivalent of rdd.zipWithIndex():
{code:java}
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.types.{LongType, StructField, StructType}
import org.apache.spark.sql.Row
def dfZipWithIndex(
df: DataFrame,
offset: Int = 1,
colName: String = "id",
inFront: Boolean = true
) : DataFrame = {
df.sqlContext.createDataFrame(
df.rdd.zipWithIndex.map(ln =>
Row.fromSeq(
(if (inFront) Seq(ln._2 + offset) else Seq())
++ ln._1.toSeq ++
(if (inFront) Seq() else Seq(ln._2 + offset))
)
),
StructType(
(if (inFront) Array(StructField(colName,LongType,false)) else Array[StructField]())
++ df.schema.fields ++
(if (inFront) Array[StructField]() else Array(StructField(colName,LongType,false)))
)
)
}
{code}
credits: [https://stackoverflow.com/questions/30304810/dataframe-ified-zipwithindex]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org