You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/02 21:43:56 UTC
[GitHub] [arrow] nealrichardson commented on a diff in pull request #12826: ARROW-15260: [R] open_dataset - add file_name as column

nealrichardson commented on code in PR #12826:
URL: https://github.com/apache/arrow/pull/12826#discussion_r936050363


##########
r/NEWS.md:
##########
@@ -69,6 +69,12 @@
 * The `arrow.dev_repo` for nightly builds of the R package and prebuilt
   libarrow binaries is now https://nightlies.apache.org/arrow/r/.
 * Brotli and BZ2 are shipped with MacOS binaries. BZ2 is shipped with Windows binaries. (ARROW-16828)
+* `lubridate::parse_date_time()` datetime parser:
+  * `orders` with year, month, day, hours, minutes, and seconds components are supported.
+  * the `orders` argument in the Arrow binding works as follows: `orders` are transformed into `formats` which subsequently get applied in turn. There is no `select_formats` parameter and no inference takes place (like is the case in `lubridate::parse_date_time()`).
+* `read_arrow()` and `write_arrow()`, deprecated since 1.0.0 (July 2020), have been removed. Use the `read/write_feather()` and `read/write_ipc_stream()` functions depending on whether you're working with the Arrow IPC file or stream format, respectively.
+* `write_parquet()` now defaults to writing Parquet format version 2.4 (was 1.0). Previously deprecated arguments `properties` and `arrow_properties` have been removed; if you need to deal with these lower-level properties objects directly, use `ParquetFileWriter`, which `write_parquet()` wraps.

Review Comment:
   These are probably a bad merge? You'll probably want to back out the news change until after 9.0.0 is released and you can add it in the right place. (We can hold this PR until after then.)



##########
r/R/dplyr.R:
##########
@@ -110,6 +110,8 @@ make_field_refs <- function(field_names) {
 #' @export
 print.arrow_dplyr_query <- function(x, ...) {
   schm <- x$.data$schema
+  schm[["__filename"]] <- string()

Review Comment:
   So this worked? Great! Do you want to add a comment here explaining what's going on, or encapsulate this augmented column business into a well named function?



##########
r/R/util.R:
##########
@@ -143,8 +143,8 @@ handle_parquet_io_error <- function(e, format, call) {
       msg,
       i = "Did you mean to specify a 'format' other than the default (parquet)?"
     )
+    abort(msg, call = call)

Review Comment:
   Oh I see, so that you could chain together multiple of these. I suspect we could have a better developer experience, this change makes it easy to accidentally swallow all other errors. Could you make a followup jira for this, and leave a big comment above these?



##########
r/R/util.R:
##########
@@ -143,8 +143,8 @@ handle_parquet_io_error <- function(e, format, call) {
       msg,
       i = "Did you mean to specify a 'format' other than the default (parquet)?"
     )
+    abort(msg, call = call)

Review Comment:
   Why these changes?



##########
r/tests/testthat/test-dataset.R:
##########
@@ -1226,3 +1225,21 @@ test_that("FileSystemFactoryOptions input validation", {
     fixed = TRUE
   )
 })
+
+test_that("can add in augmented fields", {

Review Comment:
   It's fine, would be better to catch earlier but that's not readily available.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org