You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "thisisnic (via GitHub)" <gi...@apache.org> on 2023/05/11 09:51:12 UTC

[GitHub] [arrow] thisisnic opened a new pull request, #35543: GH-35542 : [R] Impement schema extraction function

thisisnic opened a new pull request, #35543:
URL: https://github.com/apache/arrow/pull/35543

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on PR #35543:
URL: https://github.com/apache/arrow/pull/35543#issuecomment-1545852638

   Yes, but then `schema()` both constructs a schema AND pulls the schema attribute...one function doing more than one thing is also confusing although there's certainly precedent for it in pre-tidyverse R.
   
   The `as_schema()` method is perhaps more appropriate (and also perhaps easier to remember than "infer" or "extract")?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on PR #35543:
URL: https://github.com/apache/arrow/pull/35543#issuecomment-1545866555

   > Yes, but then `schema()` both constructs a schema AND pulls the schema attribute...one function doing more than one thing is also confusing although there's certainly precedent for it in pre-tidyverse R.
   > 
   
   Yeah I totally get your concern and this is why I didn't do this before, despite many times typing `schema(ds)` and then remembering that instead I had to type `ds$schema` to get the schema. I'm just not convinced (anymore) that it's less confusing than the status quo, where `getter()` exists for some attributes but not others.
   
   > The `as_schema()` method is perhaps more appropriate (and also perhaps easier to remember than "infer" or "extract")?
   
   I'm not sure that feels natural in these cases: `as_schema(ds)` feels like I'm converting the Dataset to something else, where what I really want is to access the Schema that the Dataset contains.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on a diff in pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1192329817


##########
r/R/schema.R:
##########
@@ -240,10 +226,45 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...){
+  UseMethod("schema")
+}
+
 #' @export
-#' @rdname Schema
-schema <- Schema$create
+schema.default <- Schema$create
+
+#' @export
+schema.ArrowTabular <- function(x) x$schema
+
+#' @export
+schema.RecordBatchReader <- function(x) x$schema
+
+#' @export
+schema.Dataset <- function(x) x$schema

Review Comment:
   Can you also add this for arrow_dplyr_query? Might involve calling implicit_schema, or `x[["schema"]] %||% implicit_schema(x)` or something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on PR #35543:
URL: https://github.com/apache/arrow/pull/35543#issuecomment-1546307021

   While I agree that it's not the best that `schema()` would do both construction and extraction here, the aim here was to come up with something intuitive for users that avoids having to use the `$` for extraction, and I don't think that any of the alternatives quite fit that; I agree that the `as_*` functions sound more conversion than extraction.  I just spent ~30 minutes browsing through docs for other packages (mainly tidyverse/tidymodels/r-lib etc) to see how they handle this kind of thing, but there aren't any good analogous examples.
   
   I also considered whether there are other attributes we might want to implement extraction functions like this for, but I think schemas are the only case and so here's no general problem to solve.  Given that multiple people (Neal, myself, and also Francois on the original ticket) report having tried to use `schema()` to extract a schema despite knowing that `schema()` is (also) the function for creating schemas, it might be the best intuitive option we have right now?
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1237249856


##########
r/R/schema.R:
##########
@@ -244,10 +230,59 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...) {
+  dots <- list2(...)
+
+  if (length(dots) == 1 && !is.list(dots[[1]]) && is.null(names(dots)) && !inherits(dots[[1]], "Field")) {
+    return(infer_schema(dots[[1]]))
+  }
+
+  Schema$create(...)

Review Comment:
   If the usage elsewhere is like this then I'm happy to change it here.  You didn't say that in your previous feedback. I don't really care what we do here given it doesn't actually matter in any meaningful way.  I will update it to your suggestion.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1237227776


##########
r/R/dplyr-collect.R:
##########
@@ -192,5 +192,6 @@ implicit_schema <- function(.data) {
       aggregate_types(.data, hash, old_schm)
     )
   }
-  schema(!!!new_fields)
+
+  schema(new_fields)

Review Comment:
   If you feel strongly about the ability to pass a bare list instead of `!!!bare_list` I think this is a better fit for a separate PR: it is not related to the ability to use `schema()` on a Dataset/ArrowTabular/arrow_dplyr_query and is a departure from behaviour elsewhere in the package (e.g., none of `record_batch()`, `arrow_table()`, or `struct()` support this).



##########
r/R/schema.R:
##########
@@ -244,10 +230,59 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...) {
+  dots <- list2(...)
+
+  if (length(dots) == 1 && !is.list(dots[[1]]) && is.null(names(dots)) && !inherits(dots[[1]], "Field")) {
+    return(infer_schema(dots[[1]]))
+  }
+
+  Schema$create(...)

Review Comment:
   As above, if you feel strongly about this I think it is a better fit for another PR: our usage everywhere else in the package is to capture the dots once using `list2()` and use `!!!` if they need to be passed on to another function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1237308764


##########
r/tests/testthat/test-schema.R:
##########
@@ -292,3 +292,9 @@ test_that("schema name assignment", {
   expect_identical(names(schm2), c("col1", "col2"))
   expect_identical(names(schm2$r_metadata$columns), c("col1", "col2"))
 })
+
+test_that("schema extraction", {
+  tbl <- arrow_table(example_data)
+  schema(tbl)
+  expect_equal(schema(tbl), tbl$schema)
+})

Review Comment:
   I'd thought I'd commented here but apparently not; we already test `schema(some_list)` further up:
   
   https://github.com/apache/arrow/blob/5854d0405366c025427919470327318f0089e03a/r/tests/testthat/test-schema.R#L254-L268



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on PR #35543:
URL: https://github.com/apache/arrow/pull/35543#issuecomment-1545773083

   > I am wondering if the function name schema is too generic and how about to be changed to something like arrow_schmea?
   
   I think that's a good point...particularly since it's an S3 method. `infer_schema()` or `extract_schema()` might also be good choices...we already have `as_schema()` for converting a schema-like object to a Schema. (FWIW I use `infer_nanoarrow_schema()` to do this kind of thing).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1233962048


##########
r/R/schema.R:
##########
@@ -240,10 +226,48 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...){
+  UseMethod("schema")
+}
+
 #' @export
-#' @rdname Schema
-schema <- Schema$create
+schema.default <- Schema$create

Review Comment:
   I don't understand this; how can `..object` be missing, but `...` be present?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on PR #35543:
URL: https://github.com/apache/arrow/pull/35543#issuecomment-1545804115

   > > I am wondering if the function name schema is too generic and how about to be changed to something like arrow_schmea?
   > 
   > I think that's a good point...particularly since it's an S3 method. `infer_schema()` or `extract_schema()` might also be good choices...we already have `as_schema()` for converting a schema-like object to a Schema. (FWIW I use `infer_nanoarrow_schema()` to do this kind of thing in nanoarrow).
   
   `schema()` exists as a function now (and has since the beginning of the package), this PR seems to be adding additional cases for using it to extract the `$schema` attribute from Arrow objects, and it's doing it via S3 methods rather than `if`s.  If we're worried about the name "schema" colliding with other packages, we would have already seen it already--has anyone?
   
   For what it's worth, I'm personally not a fan of renaming to `verb_schema()`: it's pulling an attribute of the object, so it feels natural that the accessor have the same name as the attribute, as we have e.g. `length.ArrowDatum <- function(x) x$length()`. As a (👴) developer, I don't want to have to remember what `verb_` goes in front, I'd rather just `$schema`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on PR #35543:
URL: https://github.com/apache/arrow/pull/35543#issuecomment-1546311188

   Though it looks like it's not possible to get this working sensibly as an S3 generic anyway, may need to redesign.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1233989967


##########
r/R/schema.R:
##########
@@ -240,10 +226,48 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...){
+  UseMethod("schema")
+}
+
 #' @export
-#' @rdname Schema
-schema <- Schema$create
+schema.default <- Schema$create

Review Comment:
   (i.e., if you only pass named arguments, `...` will be present but `..object` will not)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1236506644


##########
r/R/schema.R:
##########
@@ -244,10 +230,59 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...) {
+  dots <- list2(...)
+
+  if (length(dots) == 1 && !is.list(dots[[1]]) && is.null(names(dots)) && !inherits(dots[[1]], "Field")) {
+    return(infer_schema(dots[[1]]))
+  }
+
+  Schema$create(...)
+}
+
+#' Create a schema or extract one from an object.
+#'
+#' @param x An object which has a schema, e.g. a `Dataset`
+#' @export
+infer_schema <- function(x) {
+  UseMethod("infer_schema")
+}
+
+#' @export
+infer_schema.ArrowTabular <- function(x) x$schema
+
+#' @export
+infer_schema.RecordBatchReader <- function(x) x$schema
+
+#' @export
+infer_schema.Dataset <- function(x) x$schema
+

Review Comment:
   That feels beyond the scope of this PR, happy to do it in a follow-up if you open a ticket.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1233989221


##########
r/R/schema.R:
##########
@@ -240,10 +226,48 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...){
+  UseMethod("schema")
+}
+
 #' @export
-#' @rdname Schema
-schema <- Schema$create
+schema.default <- Schema$create

Review Comment:
   I think it can only be characterized as cruel sorcery invented by the S3 dispatch gods:
   
   ``` r
   schema <- function(..object, ...) {
     UseMethod("schema")
   }
   
   schema.default <- function(..object, ...) {
     if (missing(..object)) {
       rlang::list2(...)
     } else {
       rlang::list2(..object, ...)
     }
   }
   
   schema()
   #> list()
   schema(x = 2)
   #> $x
   #> [1] 2
   schema(2)
   #> [[1]]
   #> [1] 2
   ```
   
   <sup>Created on 2023-06-19 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on PR #35543:
URL: https://github.com/apache/arrow/pull/35543#issuecomment-1546651522

   Thanks @paleolimbot, sounds like a good middle ground there, will give that a go.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eitsupi commented on pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "eitsupi (via GitHub)" <gi...@apache.org>.
eitsupi commented on PR #35543:
URL: https://github.com/apache/arrow/pull/35543#issuecomment-1545602999

   I am wondering if the function name `schema` is too generic and how about to be changed to something like `arrow_schmea`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on a diff in pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "nealrichardson (via GitHub)" <gi...@apache.org>.
nealrichardson commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1192802928


##########
r/R/schema.R:
##########
@@ -240,10 +226,45 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...){
+  UseMethod("schema")
+}
+
 #' @export
-#' @rdname Schema
-schema <- Schema$create
+schema.default <- Schema$create
+
+#' @export
+schema.ArrowTabular <- function(x) x$schema
+
+#' @export
+schema.RecordBatchReader <- function(x) x$schema
+
+#' @export
+schema.Dataset <- function(x) x$schema

Review Comment:
   I seem to recall that when we collapse a query and call implicit_schema, it assigns $schema in the nested adq, but maybe I remember wrong. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1235299471


##########
r/R/schema.R:
##########
@@ -244,10 +230,59 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...) {
+  dots <- list2(...)
+
+  if (length(dots) == 1 && !is.list(dots[[1]]) && is.null(names(dots)) && !inherits(dots[[1]], "Field")) {

Review Comment:
   I think that `length(dots) == 1 && rlang::is_named2(dots[[1]])` may be sufficient if (1) we add `infer_schema.Field(x, ...) Schema$create(x)` and (2) we go back to using `!!!items`. If we do allow a single list to be passed as the first argument, `rlang::is_bare_list()` may be a better choice than `is.list()` (since `is.list()` will return TRUE for S3 objects that are a list under the hood like a data.frame).



##########
r/R/schema.R:
##########
@@ -244,10 +230,59 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...) {
+  dots <- list2(...)
+
+  if (length(dots) == 1 && !is.list(dots[[1]]) && is.null(names(dots)) && !inherits(dots[[1]], "Field")) {
+    return(infer_schema(dots[[1]]))
+  }
+
+  Schema$create(...)
+}
+
+#' Create a schema or extract one from an object.

Review Comment:
   ```suggestion
   #' Extract a schema from an object
   ```
   
   (maybe?)



##########
r/tests/testthat/test-schema.R:
##########
@@ -292,3 +292,9 @@ test_that("schema name assignment", {
   expect_identical(names(schm2), c("col1", "col2"))
   expect_identical(names(schm2$r_metadata$columns), c("col1", "col2"))
 })
+
+test_that("schema extraction", {
+  tbl <- arrow_table(example_data)
+  schema(tbl)
+  expect_equal(schema(tbl), tbl$schema)
+})

Review Comment:
   Some other tests that may be useful:
   
   - Test `schema(!!!some_list)` (or `schema(some_list)` if you choose that direction)
   - One test for each type of object that can have a schema extracted with `schema()` 



##########
r/R/schema.R:
##########
@@ -244,10 +230,59 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...) {
+  dots <- list2(...)
+
+  if (length(dots) == 1 && !is.list(dots[[1]]) && is.null(names(dots)) && !inherits(dots[[1]], "Field")) {
+    return(infer_schema(dots[[1]]))
+  }
+
+  Schema$create(...)
+}
+
+#' Create a schema or extract one from an object.
+#'
+#' @param x An object which has a schema, e.g. a `Dataset`
+#' @export
+infer_schema <- function(x) {
+  UseMethod("infer_schema")
+}
+
+#' @export
+infer_schema.ArrowTabular <- function(x) x$schema
+
+#' @export
+infer_schema.RecordBatchReader <- function(x) x$schema
+
+#' @export
+infer_schema.Dataset <- function(x) x$schema
+

Review Comment:
   I think you added a way to do this for `data.frame` already that might make sense to wire up here?



##########
r/R/schema.R:
##########
@@ -244,10 +230,59 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...) {
+  dots <- list2(...)
+
+  if (length(dots) == 1 && !is.list(dots[[1]]) && is.null(names(dots)) && !inherits(dots[[1]], "Field")) {
+    return(infer_schema(dots[[1]]))
+  }
+
+  Schema$create(...)

Review Comment:
   ```suggestion
     Schema$create(!!!dots)
   ```
   
   (this is what I usually see after capturing a dots expression but I forget the details/if it matters)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1236510243


##########
r/R/schema.R:
##########
@@ -244,10 +230,59 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...) {
+  dots <- list2(...)
+
+  if (length(dots) == 1 && !is.list(dots[[1]]) && is.null(names(dots)) && !inherits(dots[[1]], "Field")) {

Review Comment:
   I thought about this, but `infer_schema.Field()` doesn't quite fit, as we're creating rather than inferring a schema with it. 
   
   Thanks for the list suggestion, added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1237238768


##########
r/tests/testthat/test-schema.R:
##########
@@ -292,3 +292,18 @@ test_that("schema name assignment", {
   expect_identical(names(schm2), c("col1", "col2"))
   expect_identical(names(schm2$r_metadata$columns), c("col1", "col2"))
 })
+
+test_that("schema extraction", {
+  tbl <- arrow_table(example_data)

Review Comment:
   This might need `skip_if_not_available("dataset")` to pass the minimal nightly tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35543:
URL: https://github.com/apache/arrow/pull/35543#issuecomment-1543690602

   * Closes: #35542


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1192889153


##########
r/R/schema.R:
##########
@@ -240,10 +226,48 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...){
+  UseMethod("schema")
+}
+
 #' @export
-#' @rdname Schema
-schema <- Schema$create
+schema.default <- Schema$create

Review Comment:
   I think you could get this to work if you did something like:
   
   ```
   #' @export
   schema <- function(..object, ...) {
     UseMethod("schema")
   }
   
   schema.default <- function(..object, ...) {
     if (missing(..object)) {
       Schema$create(...)
     } else {
       Schema$create(..object, ...)
     }
   }
   ```
   
   (Using `..object` so that it's unlikely to match as a column name, although then some things like `schema(!!! stuff)` won't work properly)



##########
r/R/schema.R:
##########
@@ -240,10 +226,45 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...){
+  UseMethod("schema")
+}
+
 #' @export
-#' @rdname Schema
-schema <- Schema$create
+schema.default <- Schema$create
+
+#' @export
+schema.ArrowTabular <- function(x) x$schema
+
+#' @export
+schema.RecordBatchReader <- function(x) x$schema
+
+#' @export
+schema.Dataset <- function(x) x$schema

Review Comment:
   It's a tiny bit of a hack, but in nanoarrow I did:
   
   ``` r
   nanoarrow:::infer_nanoarrow_schema.arrow_dplyr_query
   #> function (x, ...) 
   #> {
   #>     infer_nanoarrow_schema.RecordBatchReader(arrow::as_record_batch_reader(x))
   #> }
   #> <bytecode: 0x1076e60a8>
   #> <environment: namespace:nanoarrow>
   ```
   
   <sup>Created on 2023-05-12 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup>
   
   (which works because `as_record_batch_reader()` on a query doesn't start evaluating but does record the calculated final schema)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #35543: GH-35542 : [R] Impement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1192770802


##########
r/R/schema.R:
##########
@@ -240,10 +226,45 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...){
+  UseMethod("schema")
+}
+
 #' @export
-#' @rdname Schema
-schema <- Schema$create
+schema.default <- Schema$create
+
+#' @export
+schema.ArrowTabular <- function(x) x$schema
+
+#' @export
+schema.RecordBatchReader <- function(x) x$schema
+
+#' @export
+schema.Dataset <- function(x) x$schema

Review Comment:
   Will do.  Just to check, when do we end up with `arrow_dplyr_query` objects where `x[["schema"]]` isn't `NULL`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1236516348


##########
r/R/schema.R:
##########
@@ -244,10 +230,59 @@ print_schema_fields <- function(s) {
   paste(map_chr(s$fields, ~ .$ToString()), collapse = "\n")
 }
 
-#' @param ... [fields][field] or field name/[data type][data-type] pairs
+#' Schemas
+#'
+#' Create a schema or extract one from an object.
+#'
+#' @seealso [Schema] for detailed documentation of the Schema R6 object
+#' @param ... [fields][field], field name/[data type][data-type] pairs, or object from which to extract a schema
+#' @examples
+#' # Create schema using pairs of field names and data types
+#' schema(a = int32(), b = float64())
+#'
+#' # Create schema using fields
+#' schema(
+#'   field("b", double()),
+#'   field("c", bool(), nullable = FALSE),
+#'   field("d", string())
+#' )
+#'
+#' # Extract schemas from objects
+#' df <- data.frame(col1 = 2:4, col2 = c(0.1, 0.3, 0.5))
+#' tab1 <- arrow_table(df)
+#' schema(tab1)
+#' tab2 <- arrow_table(df, schema = schema(col1 = int8(), col2 = float32()))
+#' schema(tab2)
+#' @export
+schema <- function(...) {
+  dots <- list2(...)
+
+  if (length(dots) == 1 && !is.list(dots[[1]]) && is.null(names(dots)) && !inherits(dots[[1]], "Field")) {
+    return(infer_schema(dots[[1]]))
+  }
+
+  Schema$create(...)

Review Comment:
   Nah, it doesn't make any difference.  I've intentionally left it as-is as the intention is more obvious like that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic merged pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic merged PR #35543:
URL: https://github.com/apache/arrow/pull/35543


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1237279399


##########
r/R/dplyr-collect.R:
##########
@@ -192,5 +192,6 @@ implicit_schema <- function(.data) {
       aggregate_types(.data, hash, old_schm)
     )
   }
-  schema(!!!new_fields)
+
+  schema(new_fields)

Review Comment:
   I don't understand your comment; this is functionality we already support rather than functionality I added?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1237317960


##########
r/tests/testthat/test-schema.R:
##########
@@ -292,3 +292,18 @@ test_that("schema name assignment", {
   expect_identical(names(schm2), c("col1", "col2"))
   expect_identical(names(schm2$r_metadata$columns), c("col1", "col2"))
 })
+
+test_that("schema extraction", {
+  tbl <- arrow_table(example_data)

Review Comment:
   Thanks, updated!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1238546608


##########
r/R/dplyr-collect.R:
##########
@@ -192,5 +192,6 @@ implicit_schema <- function(.data) {
       aggregate_types(.data, hash, old_schm)
     )
   }
-  schema(!!!new_fields)
+
+  schema(new_fields)

Review Comment:
   I'm sorry, I had no idea it was an existing feature. Could it be in the documentation for `...` in `schema()`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on code in PR #35543:
URL: https://github.com/apache/arrow/pull/35543#discussion_r1238604905


##########
r/R/dplyr-collect.R:
##########
@@ -192,5 +192,6 @@ implicit_schema <- function(.data) {
       aggregate_types(.data, hash, old_schm)
     )
   }
-  schema(!!!new_fields)
+
+  schema(new_fields)

Review Comment:
   Actually, yeah, I should totally add it to those docs (and perhaps an example too) - if it's not obvious to you as one of the main devs here, it won't be obvious to others either!  Thanks for catching this!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #35543: GH-35542 : [R] Implement schema extraction function

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #35543:
URL: https://github.com/apache/arrow/pull/35543#issuecomment-1607452122

   Conbench analyzed the 6 benchmark runs on commit `29a339f5`.
   
   There was 1 benchmark result indicating a performance regression:
   
   - Commit Run on `ursa-thinkcentre-m75q` at [2023-06-25 16:55:21Z](http://conbench.ursa.dev/compare/runs/c886559fb9144d88a77d81d44362bfed...006d2c969c8a40909c8fe418251f769f/)
     - [params=1048576/1, source=cpp-micro, suite=arrow-acero-aggregate-benchmark](http://conbench.ursa.dev/compare/benchmarks/064984ad367778ea8000c90ff066cca4...0649871b430478ee8000b83a31ce169c)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14555793421) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org