You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kan Zhang (JIRA)" <ji...@apache.org> on 2014/06/09 19:03:03 UTC
[jira] [Updated] (SPARK-2079) Skip unnecessary wrapping in List
when serializing SchemaRDD to Python
[ https://issues.apache.org/jira/browse/SPARK-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kan Zhang updated SPARK-2079:
-----------------------------
Description:
Finishing the TODO:
{code}
private[sql] def javaToPython: JavaRDD[Array[Byte]] = {
val fieldNames: Seq[String] = this.queryExecution.analyzed.output.map(_.name)
this.mapPartitions { iter =>
val pickle = new Pickler
iter.map { row =>
val map: JMap[String, Any] = new java.util.HashMap
// TODO: We place the map in an ArrayList so that the object is pickled to a List[Dict].
// Ideally we should be able to pickle an object directly into a Python collection so we
// don't have to create an ArrayList every time.
val arr: java.util.ArrayList[Any] = new java.util.ArrayList
row.zip(fieldNames).foreach { case (obj, name) =>
map.put(name, obj)
}
arr.add(map)
pickle.dumps(arr)
}
}
}
{code}
> Skip unnecessary wrapping in List when serializing SchemaRDD to Python
> ----------------------------------------------------------------------
>
> Key: SPARK-2079
> URL: https://issues.apache.org/jira/browse/SPARK-2079
> Project: Spark
> Issue Type: Improvement
> Components: PySpark, SQL
> Affects Versions: 1.0.0
> Reporter: Kan Zhang
> Assignee: Kan Zhang
>
> Finishing the TODO:
> {code}
> private[sql] def javaToPython: JavaRDD[Array[Byte]] = {
> val fieldNames: Seq[String] = this.queryExecution.analyzed.output.map(_.name)
> this.mapPartitions { iter =>
> val pickle = new Pickler
> iter.map { row =>
> val map: JMap[String, Any] = new java.util.HashMap
> // TODO: We place the map in an ArrayList so that the object is pickled to a List[Dict].
> // Ideally we should be able to pickle an object directly into a Python collection so we
> // don't have to create an ArrayList every time.
> val arr: java.util.ArrayList[Any] = new java.util.ArrayList
> row.zip(fieldNames).foreach { case (obj, name) =>
> map.put(name, obj)
> }
> arr.add(map)
> pickle.dumps(arr)
> }
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)