You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by "gregleleu (via GitHub)" <gi...@apache.org> on 2023/02/16 00:14:12 UTC

[GitHub] [sedona] gregleleu opened a new pull request, #770: [SEDONA-XXX] R features: read/write geoparquet, get names from shapefiles

gregleleu opened a new pull request, #770:
URL: https://github.com/apache/sedona/pull/770

   Updates to the R package:
   * reading/writing geoparquet files 
   * getting names to sdf from cases where "fieldNames" exists (shapefile)
   
   ## How was this patch tested?
   Added/updated the relevant R tests
   
   ## Did this PR include necessary documentation updates?
   Updated the R documentation only
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] gregleleu commented on a diff in pull request #770: [SEDONA-243] R features: read/write geoparquet, get names from shapefiles

Posted by "gregleleu (via GitHub)" <gi...@apache.org>.
gregleleu commented on code in PR #770:
URL: https://github.com/apache/sedona/pull/770#discussion_r1109092384


##########
R/R/data_interface.R:
##########
@@ -439,6 +439,52 @@ sedona_read_shapefile <- function(sc,
     new_spatial_rdd(NULL)
 }
 
+
+#' Read a geoparquet file into a Spark DataFrame.
+#' Read a geoparquet file into a Spark DataFrame. The created dataframe is automatically registered.
+#'
+#' @param sc A \code{spark_connection}.
+#' @param location Location of the data source.
+#' @param name The name to assign to the newly generated table.
+#'
+#'
+#' @return A SpatialRDD.
+#'
+#' @examples
+#' library(sparklyr)
+#' library(apache.sedona)
+#'
+#' sc <- spark_connect(master = "spark://HOST:PORT")
+#'
+#' if (!inherits(sc, "test_connection")) {
+#'   input_location <- "/dev/null" # replace it with the path to your input file
+#'   rdd <- sedona_read_geoparquet(sc, location = input_location)
+#' }
+#'
+#' @family Sedona data interface functions
+#'
+#' @export
+sedona_read_geoparquet <- function(sc,

Review Comment:
   Regarding the Geotiff: I started working on it. Reading the geotiff works fine.
   There is one complexity given the created dataframe is a nested object. There is a package to help with that case: `sparklyr.nested`. I'll need more time to figure out whether we need to list it as a dependency or just suggest its use. And if it is required for the testing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] jiayuasu commented on a diff in pull request #770: [SEDONA-XXX] R features: read/write geoparquet, get names from shapefiles

Posted by "jiayuasu (via GitHub)" <gi...@apache.org>.
jiayuasu commented on code in PR #770:
URL: https://github.com/apache/sedona/pull/770#discussion_r1107921964


##########
R/R/data_interface.R:
##########
@@ -439,6 +439,52 @@ sedona_read_shapefile <- function(sc,
     new_spatial_rdd(NULL)
 }
 
+
+#' Read a geoparquet file into a Spark DataFrame.
+#' Read a geoparquet file into a Spark DataFrame. The created dataframe is automatically registered.
+#'
+#' @param sc A \code{spark_connection}.
+#' @param location Location of the data source.
+#' @param name The name to assign to the newly generated table.
+#'
+#'
+#' @return A SpatialRDD.
+#'
+#' @examples
+#' library(sparklyr)
+#' library(apache.sedona)
+#'
+#' sc <- spark_connect(master = "spark://HOST:PORT")
+#'
+#' if (!inherits(sc, "test_connection")) {
+#'   input_location <- "/dev/null" # replace it with the path to your input file
+#'   rdd <- sedona_read_geoparquet(sc, location = input_location)
+#' }
+#'
+#' @family Sedona data interface functions
+#'
+#' @export
+sedona_read_geoparquet <- function(sc,

Review Comment:
   Thanks for the explanation. Makes sense to me. Would you please add one more reader/writer `sedona_read_geotiff` and `sedona_write_geotiff`?
   
   It works the same way as the geoparquet: https://sedona.apache.org/latest-snapshot/api/sql/Raster-loader/#geotiff-dataframe-loader.
   
   If you think this is too complicated, we certainly can leave it to another PR. 
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] gregleleu commented on pull request #770: [SEDONA-243] R features: read/write geoparquet, get names from shapefiles

Posted by "gregleleu (via GitHub)" <gi...@apache.org>.
gregleleu commented on PR #770:
URL: https://github.com/apache/sedona/pull/770#issuecomment-1432361582

   Do we keep the informative message ("GeoParquet file does not contain valid geo metadata") if we change the class of the exception?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] jiayuasu commented on pull request #770: [SEDONA-XXX] R features: read/write geoparquet, get names from shapefiles

Posted by "jiayuasu (via GitHub)" <gi...@apache.org>.
jiayuasu commented on PR #770:
URL: https://github.com/apache/sedona/pull/770#issuecomment-1432309242

   @gregleleu Each Sedona PR needs to be associated with a JIRA ticket. I can create a Sedona JIRA account for you. This way, you can easily contribute to Sedona R in the future. To do so, I just need your email address. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] jiayuasu commented on pull request #770: [SEDONA-XXX] R features: read/write geoparquet, get names from shapefiles

Posted by "jiayuasu (via GitHub)" <gi...@apache.org>.
jiayuasu commented on PR #770:
URL: https://github.com/apache/sedona/pull/770#issuecomment-1432343518

   @gregleleu 
   
   Spark Analysis exception has been there for a while: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/AnalysisException.scala
   
   Not sure why it failed. In fact, we have the same Scala tests on Spark 3.0.3 and it can successfully capture the exception. @Kontinuation Hi Kristin, any idea why this AnalysisException was not found in Spark 3.0.3?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] jiayuasu commented on pull request #770: [SEDONA-243] R features: read/write geoparquet, get names from shapefiles

Posted by "jiayuasu (via GitHub)" <gi...@apache.org>.
jiayuasu commented on PR #770:
URL: https://github.com/apache/sedona/pull/770#issuecomment-1432483118

   @gregleleu #771 has been merged. Please pull from the upstream. This will trigger GitHub actions of this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] jiayuasu merged pull request #770: [SEDONA-243] R features: read/write geoparquet, get names from shapefiles

Posted by "jiayuasu (via GitHub)" <gi...@apache.org>.
jiayuasu merged PR #770:
URL: https://github.com/apache/sedona/pull/770


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] Kontinuation commented on pull request #770: [SEDONA-243] R features: read/write geoparquet, get names from shapefiles

Posted by "Kontinuation (via GitHub)" <gi...@apache.org>.
Kontinuation commented on PR #770:
URL: https://github.com/apache/sedona/pull/770#issuecomment-1432417888

   > @Kontinuation Yes, let's change it to `IllegalArgumentException`. Can you make a PR to fix it?
   
   Submitted https://github.com/apache/sedona/pull/771


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] Kontinuation commented on pull request #770: [SEDONA-XXX] R features: read/write geoparquet, get names from shapefiles

Posted by "Kontinuation (via GitHub)" <gi...@apache.org>.
Kontinuation commented on PR #770:
URL: https://github.com/apache/sedona/pull/770#issuecomment-1432355733

   > @gregleleu
   > 
   > Spark Analysis exception has been there for a while: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/AnalysisException.scala
   > 
   > Not sure why it failed. In fact, we have the same Scala tests on Spark 3.0.3 and it can successfully capture the exception. @Kontinuation Hi Kristin, any idea why this AnalysisException was not found in Spark 3.0.3?
   
   `AnalysisException` has some ABI-breaking changes after 3.0, so the code compiled with spark 3.3 could break on spark 3.0. Can we simply replace them with `IllegalArgumentException`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] Kontinuation commented on pull request #770: [SEDONA-243] R features: read/write geoparquet, get names from shapefiles

Posted by "Kontinuation (via GitHub)" <gi...@apache.org>.
Kontinuation commented on PR #770:
URL: https://github.com/apache/sedona/pull/770#issuecomment-1432376907

   > Do we keep the informative message ("GeoParquet file does not contain valid geo metadata") if we change the class of the exception?
   
   Yes, the message will be kept.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] gregleleu commented on pull request #770: [SEDONA-243] R features: read/write geoparquet, get names from shapefiles

Posted by "gregleleu (via GitHub)" <gi...@apache.org>.
gregleleu commented on PR #770:
URL: https://github.com/apache/sedona/pull/770#issuecomment-1433806591

   @jiayuasu Made a few changes, notably changed the name of the write function to be coherent with the rest of the sparklyr ecosystem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] jiayuasu commented on pull request #770: [SEDONA-XXX] R features: read/write geoparquet, get names from shapefiles

Posted by "jiayuasu (via GitHub)" <gi...@apache.org>.
jiayuasu commented on PR #770:
URL: https://github.com/apache/sedona/pull/770#issuecomment-1432359190

   @Kontinuation Yes, let's change it to `IllegalArgumentException`. Can you make a PR to fix it?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] gregleleu commented on pull request #770: [SEDONA-XXX] R features: read/write geoparquet, get names from shapefiles

Posted by "gregleleu (via GitHub)" <gi...@apache.org>.
gregleleu commented on PR #770:
URL: https://github.com/apache/sedona/pull/770#issuecomment-1432313192

   I created an account, and wanted to create an issue, but JIRA is so complicated I gave up. I'll try again next time.
   
   Regarding the "generic" function, it's not really the R way, cf. sparklyr's documentation (https://spark.rstudio.com/packages/sparklyr/latest/reference/)
   
   Any idea why the checks failed? I'm using the same test as the scala version – an exception with the words "GeoParquet file does not contain valid geo metadata" – but it raised another kind of exception for spark 3.0.3


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [sedona] jiayuasu commented on a diff in pull request #770: [SEDONA-XXX] R features: read/write geoparquet, get names from shapefiles

Posted by "jiayuasu (via GitHub)" <gi...@apache.org>.
jiayuasu commented on code in PR #770:
URL: https://github.com/apache/sedona/pull/770#discussion_r1107896644


##########
R/R/data_interface.R:
##########
@@ -439,6 +439,52 @@ sedona_read_shapefile <- function(sc,
     new_spatial_rdd(NULL)
 }
 
+
+#' Read a geoparquet file into a Spark DataFrame.
+#' Read a geoparquet file into a Spark DataFrame. The created dataframe is automatically registered.
+#'
+#' @param sc A \code{spark_connection}.
+#' @param location Location of the data source.
+#' @param name The name to assign to the newly generated table.
+#'
+#'
+#' @return A SpatialRDD.
+#'
+#' @examples
+#' library(sparklyr)
+#' library(apache.sedona)
+#'
+#' sc <- spark_connect(master = "spark://HOST:PORT")
+#'
+#' if (!inherits(sc, "test_connection")) {
+#'   input_location <- "/dev/null" # replace it with the path to your input file
+#'   rdd <- sedona_read_geoparquet(sc, location = input_location)
+#' }
+#'
+#' @family Sedona data interface functions
+#'
+#' @export
+sedona_read_geoparquet <- function(sc,

Review Comment:
   Can we add a function that takes any format string? This way, the R interface will be able to automatically invoke any new reader we add in the future?
   
   The example usage could be:
   
   Read GeoParquet
   
   ```
   sdf = sedona_read(sc, "geoparquet", location, name)
   ```
   
   Read GeoTiff
   
   ```
   sdf = sedona_read(sc, "geotiff", location, name)
   ```



##########
R/R/data_interface.R:
##########
@@ -589,6 +635,52 @@ sedona_save_spatial_rdd <- function(x,
   )
 }
 
+
+#' Save a Spark dataframe into a geoparquet file.
+#'
+#' Export spatial from a Spark dataframe into a geoparquet file
+#'
+#' @param x A Spark dataframe object in sparklyr or a dplyr expression
+#'   representing a Spark SQL query.
+#' @param output_location Location of the output file.
+#'
+#'
+#' @return NULL
+#'
+#' @examples
+#' library(sparklyr)
+#' library(apache.sedona)
+#'
+#' sc <- spark_connect(master = "spark://HOST:PORT")
+#'
+#' if (!inherits(sc, "test_connection")) {
+#'   tbl <- dplyr::tbl(
+#'     sc,
+#'     dplyr::sql("SELECT ST_GeomFromText('POINT(-71.064544 42.28787)') AS `pt`")
+#'   )
+#'   sedona_save_geoparquet(
+#'     tbl %>% dplyr::mutate(id = 1),
+#'     output_location = "/tmp/pts.geoparquet"
+#'   )
+#' }
+#'
+#' @family Sedona data interface functions
+#'
+#' @export
+sedona_save_geoparquet <- function(x,
+                                   output_location) {

Review Comment:
   Same here:
   
   We could make this save function more generic:
   
   ```
   sedona_save(df, "geoparqet", location)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org