You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by GitBox <gi...@apache.org> on 2022/04/15 00:36:05 UTC

[GitHub] [incubator-sedona] ashar236 opened a new pull request, #607: GeoParquet Reader Implemented (SEDONA-94)

ashar236 opened a new pull request, #607:
URL: https://github.com/apache/incubator-sedona/pull/607

   ## Is this PR related to a proposed Issue?
   SEDONA-94 
   ## What changes were proposed in this PR?
   GeoParquet reader support was added.
   ## How was this patch tested?
   The unit test for three different parquet file with geometry column have been added.
   ## Did this PR include necessary documentation updates?
   Yes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] ashar236 closed pull request #607: GeoParquet Reader Implemented (SEDONA-94)

Posted by GitBox <gi...@apache.org>.
ashar236 closed pull request #607: GeoParquet Reader Implemented (SEDONA-94)
URL: https://github.com/apache/incubator-sedona/pull/607


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] MulesNose commented on pull request #607: GeoParquet Reader Implemented (SEDONA-94)

Posted by GitBox <gi...@apache.org>.
MulesNose commented on PR #607:
URL: https://github.com/apache/incubator-sedona/pull/607#issuecomment-1116100448

   Great stuff, found your question from Opengeospatial and almost linked you to your own commit!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] ericsun95 commented on a diff in pull request #607: GeoParquet Reader Implemented (SEDONA-94)

Posted by GitBox <gi...@apache.org>.
ericsun95 commented on code in PR #607:
URL: https://github.com/apache/incubator-sedona/pull/607#discussion_r896775498


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/io/geoparquet/ParquetFileFormat.scala:
##########
@@ -0,0 +1,545 @@
+/*

Review Comment:
   Looks like all these are duplicated with Spark internal parquet reader implementation. From my understanding the geoparquet is just one specific type of parquet with its only schema. 
   Generally, what differs here? how about implementing a very lean layer on top of spark native parquet file reader using DataSourceV2 interface with specific schema? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] jiayuasu commented on a diff in pull request #607: GeoParquet Reader Implemented (SEDONA-94)

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on code in PR #607:
URL: https://github.com/apache/incubator-sedona/pull/607#discussion_r898455419


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/io/geoparquet/ParquetFileFormat.scala:
##########
@@ -0,0 +1,545 @@
+/*

Review Comment:
   Thank you for pointing this out. Yes, this is the solution we are working on. Will update this PR asap



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] JassonHua commented on pull request #607: GeoParquet Reader Implemented (SEDONA-94)

Posted by GitBox <gi...@apache.org>.
JassonHua commented on PR #607:
URL: https://github.com/apache/incubator-sedona/pull/607#issuecomment-1142650283

   您好,您的邮件我已收到,我会尽快给您回复。谢谢!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] jiayuasu commented on pull request #607: GeoParquet Reader Implemented (SEDONA-94)

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on PR #607:
URL: https://github.com/apache/incubator-sedona/pull/607#issuecomment-1143962777

   > Is there any progress on this? I'd love to add Sedona to the list at [opengeospatial/geoparquet#current-implementations--examples](https://github.com/opengeospatial/geoparquet#current-implementations--examples)
   
   We are still working on this. The PR will be updated soon. Will keep you posted


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] neontty commented on a diff in pull request #607: GeoParquet Reader Implemented (SEDONA-94)

Posted by GitBox <gi...@apache.org>.
neontty commented on code in PR #607:
URL: https://github.com/apache/incubator-sedona/pull/607#discussion_r961791343


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/io/geoparquet/ParquetFileFormat.scala:
##########
@@ -0,0 +1,545 @@
+/*

Review Comment:
   @jiayuasu if I start working on geoparquet predicate pushdown, will it be invalidated if you are moving to the datasource V2 api soon? Is there a branch with the v2 api ("[We have a neat version that nicely supports Spark 3.3 only](https://github.com/apache/incubator-sedona/pull/652)")?  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] cholmes commented on pull request #607: GeoParquet Reader Implemented (SEDONA-94)

Posted by GitBox <gi...@apache.org>.
cholmes commented on PR #607:
URL: https://github.com/apache/incubator-sedona/pull/607#issuecomment-1142650040

   Is there any progress on this? I'd love to add Sedona to the list at https://github.com/opengeospatial/geoparquet#current-implementations--examples


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] jiayuasu commented on a diff in pull request #607: GeoParquet Reader Implemented (SEDONA-94)

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on code in PR #607:
URL: https://github.com/apache/incubator-sedona/pull/607#discussion_r962039035


##########
sql/src/main/scala/org/apache/spark/sql/sedona_sql/io/geoparquet/ParquetFileFormat.scala:
##########
@@ -0,0 +1,545 @@
+/*

Review Comment:
   @neontty My understanding is that the merged PR in Sedona already uses Datasource v2 as Spark Parquet reader is using it. But Spark parquet reader is designed in a way that most critical functions are private, so we have to copy lots of their code for adding a few changes on our own.
   
   The version that supports Spark 3.3 only is here: https://github.com/ashar236/incubator-sedona/tree/geoparquet-spark3.3/sql/src/main/scala/org/apache/spark/sql/execution/datasources/parquet
   
   In short, I believe your geoparquet predicate pushdown function will Not be invalidated. But I would suggest that please minimize your change to the codebase.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-sedona] jiayuasu commented on pull request #607: GeoParquet Reader Implemented (SEDONA-94)

Posted by GitBox <gi...@apache.org>.
jiayuasu commented on PR #607:
URL: https://github.com/apache/incubator-sedona/pull/607#issuecomment-1334799027

   > Is there any progress on this? I'd love to add Sedona to the list at [opengeospatial/geoparquet#current-implementations--examples](https://github.com/opengeospatial/geoparquet#current-implementations--examples)
   
   @cholmes Dear Chris, we have released Sedona 1.3.0 which includes reader/writer for GeoParquet. I created [a PR](https://github.com/opengeospatial/geoparquet/pull/150) on geoparquet repo as well. Please review it when you have time :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org