You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/15 12:43:03 UTC

[GitHub] [arrow] nealrichardson commented on a change in pull request #10646: ARROW-13189: [R] Disable row-level metadata application on datasets

nealrichardson commented on a change in pull request #10646:
URL: https://github.com/apache/arrow/pull/10646#discussion_r670424361



##########
File path: r/R/metadata.R
##########
@@ -60,9 +61,23 @@ apply_arrow_r_metadata <- function(x, r_metadata) {
         }
       }
     } else if (is.list(x) && !inherits(x, "POSIXlt") && !is.null(columns_metadata)) {
-      x <- map2(x, columns_metadata, function(.x, .y) {
-        apply_arrow_r_metadata(.x, .y)
-      })
+      # If we have a list and "columns_metadata" this applies row-level metadata
+      # inside of a column in a dataframe.
+
+      # However, if we are inside of a dataset collection, we cannot apply this
+      # row-level metadata, since the order of the rows is not gauranteed to be
+      # the same, so don't even try, but warn what's going on
+      stack <- call_stack()
+      in_dataset_collect <- any(map_lgl(stack, function(x) {
+        !is.null(x$fn_name) && x$fn_name == "collect.arrow_dplyr_query"
+      }))
+      if (in_dataset_collect) {
+        warning("Row-level metadata has been discarded")

Review comment:
       Does this message deserve more explanation? Trying to think of what I would do if I saw that message.
   
   Also: 
   
   ```suggestion
           warning("Row-level metadata has been discarded", call. = FALSE)
   ```

##########
File path: r/tests/testthat/test-metadata.R
##########
@@ -205,3 +205,27 @@ test_that("metadata of list elements (ARROW-10386)", {
   expect_identical(attr(as.data.frame(tab)$x[[1]], "foo"), "bar")
   expect_identical(attr(as.data.frame(tab)$x[[2]], "baz"), "qux")
 })
+
+
+test_that("metadata of list elements (ARROW-10386)", {
+  skip_if_not_available("dataset")
+  skip_if_not_available("parquet")
+  df <- tibble::tibble(
+    metadata = list(
+      structure(1, my_value_as_attr = 1),
+      structure(2, my_value_as_attr = 2),
+      structure(3, my_value_as_attr = 3),
+      structure(4, my_value_as_attr = 3)),
+    part = c(1, 3, 2, 1)
+  )
+
+  dst_dir <- make_temp_dir()
+  write_dataset(df, dst_dir, partitioning = "part")
+  ds <- open_dataset(dst_dir)
+  expect_warning(
+    df_from_ds <- dplyr::collect(ds),

Review comment:
       What happens if I `ds %>% select(part) %>% collect()`, do I get the warning? I'd think I shouldn't.

##########
File path: r/R/metadata.R
##########
@@ -60,9 +61,23 @@ apply_arrow_r_metadata <- function(x, r_metadata) {
         }
       }
     } else if (is.list(x) && !inherits(x, "POSIXlt") && !is.null(columns_metadata)) {
-      x <- map2(x, columns_metadata, function(.x, .y) {
-        apply_arrow_r_metadata(.x, .y)
-      })
+      # If we have a list and "columns_metadata" this applies row-level metadata
+      # inside of a column in a dataframe.
+
+      # However, if we are inside of a dataset collection, we cannot apply this
+      # row-level metadata, since the order of the rows is not gauranteed to be

Review comment:
       ```suggestion
         # row-level metadata, since the order of the rows is not guaranteed to be
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org