You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sedona.apache.org by "jiayuasu (via GitHub)" <gi...@apache.org> on 2023/02/16 00:58:16 UTC

[GitHub] [sedona] jiayuasu commented on a diff in pull request #770: [SEDONA-XXX] R features: read/write geoparquet, get names from shapefiles

jiayuasu commented on code in PR #770:
URL: https://github.com/apache/sedona/pull/770#discussion_r1107896644


##########
R/R/data_interface.R:
##########
@@ -439,6 +439,52 @@ sedona_read_shapefile <- function(sc,
     new_spatial_rdd(NULL)
 }
 
+
+#' Read a geoparquet file into a Spark DataFrame.
+#' Read a geoparquet file into a Spark DataFrame. The created dataframe is automatically registered.
+#'
+#' @param sc A \code{spark_connection}.
+#' @param location Location of the data source.
+#' @param name The name to assign to the newly generated table.
+#'
+#'
+#' @return A SpatialRDD.
+#'
+#' @examples
+#' library(sparklyr)
+#' library(apache.sedona)
+#'
+#' sc <- spark_connect(master = "spark://HOST:PORT")
+#'
+#' if (!inherits(sc, "test_connection")) {
+#'   input_location <- "/dev/null" # replace it with the path to your input file
+#'   rdd <- sedona_read_geoparquet(sc, location = input_location)
+#' }
+#'
+#' @family Sedona data interface functions
+#'
+#' @export
+sedona_read_geoparquet <- function(sc,

Review Comment:
   Can we add a function that takes any format string? This way, the R interface will be able to automatically invoke any new reader we add in the future?
   
   The example usage could be:
   
   Read GeoParquet
   
   ```
   sdf = sedona_read(sc, "geoparquet", location, name)
   ```
   
   Read GeoTiff
   
   ```
   sdf = sedona_read(sc, "geotiff", location, name)
   ```



##########
R/R/data_interface.R:
##########
@@ -589,6 +635,52 @@ sedona_save_spatial_rdd <- function(x,
   )
 }
 
+
+#' Save a Spark dataframe into a geoparquet file.
+#'
+#' Export spatial from a Spark dataframe into a geoparquet file
+#'
+#' @param x A Spark dataframe object in sparklyr or a dplyr expression
+#'   representing a Spark SQL query.
+#' @param output_location Location of the output file.
+#'
+#'
+#' @return NULL
+#'
+#' @examples
+#' library(sparklyr)
+#' library(apache.sedona)
+#'
+#' sc <- spark_connect(master = "spark://HOST:PORT")
+#'
+#' if (!inherits(sc, "test_connection")) {
+#'   tbl <- dplyr::tbl(
+#'     sc,
+#'     dplyr::sql("SELECT ST_GeomFromText('POINT(-71.064544 42.28787)') AS `pt`")
+#'   )
+#'   sedona_save_geoparquet(
+#'     tbl %>% dplyr::mutate(id = 1),
+#'     output_location = "/tmp/pts.geoparquet"
+#'   )
+#' }
+#'
+#' @family Sedona data interface functions
+#'
+#' @export
+sedona_save_geoparquet <- function(x,
+                                   output_location) {

Review Comment:
   Same here:
   
   We could make this save function more generic:
   
   ```
   sedona_save(df, "geoparqet", location)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sedona.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org