You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by GitBox <gi...@apache.org> on 2020/12/13 03:00:11 UTC

[GitHub] [incubator-sedona] jiayuasu commented on a change in pull request #496: [SEDONA-3] Add faster Python conversion from spatial rdd to df.

jiayuasu commented on a change in pull request #496:
URL: https://github.com/apache/incubator-sedona/pull/496#discussion_r541829421



##########
File path: sql/src/main/scala/org/apache/sedona/sql/utils/Adapter.scala
##########
@@ -136,19 +136,60 @@ object Adapter {
     }
   }
 
+  def toGeometryDf[T <: Geometry](spatialRDD: SpatialRDD[T], sparkSession: SparkSession): DataFrame = {
+    val rowRdd = spatialRDD.rawSpatialRDD.rdd.map[Row] {
+      geom =>
+        val userData = geom.getUserData
+        geom.setUserData(null)
+
+        Row.fromSeq(Seq(geom, userData))
+    }
+    var fieldArray = new Array[StructField](2)
+    fieldArray(0) = StructField("_c0", GeometryUDT)
+    fieldArray(1) = StructField("_c1", StringType)
+
+    val schema = StructType(fieldArray)
+    sparkSession.createDataFrame(rowRdd, schema)
+  }
+
+  def toGeometryDf(spatialPairRDD: JavaPairRDD[Geometry, Geometry], sparkSession: SparkSession): DataFrame = {

Review comment:
       I think you made a good improvement. We should directly convert the geometry type in RDD back to Geometry type in DataFrame. However, I believe we should put this improvement you made in the toGeometryDf to the old toDf function, instead of maintaining two separate functions. Would you mind grant me the write access to your fork and I can merge the two functions together?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org