You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by "Brian Rice (Jira)" <ji...@apache.org> on 2022/07/14 01:15:00 UTC

[jira] [Created] (SEDONA-133) Allow user-defined schemas in Adapter.toDf()

Brian Rice created SEDONA-133:
---------------------------------

             Summary: Allow user-defined schemas in Adapter.toDf()
                 Key: SEDONA-133
                 URL: https://issues.apache.org/jira/browse/SEDONA-133
             Project: Apache Sedona
          Issue Type: Improvement
            Reporter: Brian Rice


Hello!

I would like to propose a new overloaded method for supporting user-defined schemas in {{Adapter.toDf()}} (for both SpatialRDD and JavaPairRDD). Currently fields are coerced to StringType, which does not work for all use cases (specifically, I have structs that lose all their nested columns if casted to StringType). I can do a workaround, but it would be nice to have this off the shelf. Some sample code from Adapter.scala:

{{cols = cols ++ fieldNames.map(f => StructField(f, {+}StringType{+}))}}
 
{{...}}
 
{{cols = cols ++ leftFieldnames.map(fName => StructField(fName, {+}StringType{+}))}}
{{cols = cols ++ rightFieldNames.map(fName => StructField(fName, {+}StringType{+}))}}
 
My thinking is that the user could provide the schema directly in the form of a StructType object. The expectation would be that they are responsible enough to provide the correct field names and data types if they want to provide the schema at all.
 
I would be happy to work on a PR if it's deemed appropriate. What are your thoughts?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)