You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/24 11:27:11 UTC

[GitHub] [arrow] thisisnic opened a new pull request #10381: ARROW-12722: [R] Raise error when attemping to print table with duplicated naming

thisisnic opened a new pull request #10381:
URL: https://github.com/apache/arrow/pull/10381


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] bkietz commented on a change in pull request #10381: ARROW-12722: [R] Raise error when attemping to print table with duplicated naming

Posted by GitBox <gi...@apache.org>.
bkietz commented on a change in pull request #10381:
URL: https://github.com/apache/arrow/pull/10381#discussion_r641134667



##########
File path: r/R/dplyr.R
##########
@@ -74,7 +74,13 @@ print.arrow_dplyr_query <- function(x, ...) {
     name <- expr$field_name
     if (nzchar(name)) {
       # Just a field_ref, so look up in the schema
-      schm$GetFieldByName(name)$type$ToString()
+      field_name <- schm$GetFieldByName(name)

Review comment:
       Specifically: looking up fields by name when that name is not unique in the schema is not supported. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10381: ARROW-12722: [R] Raise error when attemping to print table with duplicated naming

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10381:
URL: https://github.com/apache/arrow/pull/10381#discussion_r640492633



##########
File path: r/R/dplyr.R
##########
@@ -74,7 +74,13 @@ print.arrow_dplyr_query <- function(x, ...) {
     name <- expr$field_name
     if (nzchar(name)) {
       # Just a field_ref, so look up in the schema
-      schm$GetFieldByName(name)$type$ToString()
+      field_name <- schm$GetFieldByName(name)

Review comment:
       Have updated now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10381: ARROW-12722: [R] Raise error when attemping to print table with duplicated naming

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10381:
URL: https://github.com/apache/arrow/pull/10381#discussion_r640168884



##########
File path: r/R/dplyr.R
##########
@@ -74,7 +74,13 @@ print.arrow_dplyr_query <- function(x, ...) {
     name <- expr$field_name
     if (nzchar(name)) {
       # Just a field_ref, so look up in the schema
-      schm$GetFieldByName(name)$type$ToString()
+      field_name <- schm$GetFieldByName(name)

Review comment:
       I checked with @bkietz and (currently at least) evaluating expressions on a dataset with duplicated field names will error, so if that's the case (perhaps you could verify), we should check this sooner than the print method. How about instead we put a check around L33 above, like
   
   ```r
   if (anyDuplicated(names(.data))) {
     abort("useful message")
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10381: ARROW-12722: [R] Raise error when attemping to print table with duplicated naming

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10381:
URL: https://github.com/apache/arrow/pull/10381#discussion_r641019402



##########
File path: r/R/dplyr.R
##########
@@ -31,6 +31,18 @@ arrow_dplyr_query <- function(.data) {
     return(.data)
   }
 
+  # Evaluating expressions on a dataset with duplicated fieldnames will error
+  if (anyDuplicated(names(.data))) {
+    abort(c(
+      "Duplicated field names",
+      x = paste0(
+        "The following field names were found more than once in the data: ",
+        paste(names(.data)[duplicated(names(.data))], collapse = ", ")
+      ),
+      i =  "All field names must be unique"
+    ))
+  }

Review comment:
       Not that it matters much but we can call names() and duplicated() less (and use our paste helper):
   
   ```suggestion
     dupes <- duplicated(names(.data))
     if (any(dupes)) {
       abort(c(
         "Duplicated field names",
         x = paste0(
           "The following field names were found more than once in the data: ",
           oxford_paste(names(.data)[dupes])
         ),
         i =  "All field names must be unique"
       ))
     }
   ```
   
   Also, do we need `i =  "All field names must be unique"`? Is that something that `abort()` suggests/insists on? It seems redundant.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10381: ARROW-12722: [R] Raise error when attemping to print table with duplicated naming

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10381:
URL: https://github.com/apache/arrow/pull/10381#issuecomment-846979067


   https://issues.apache.org/jira/browse/ARROW-12722


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10381: ARROW-12722: [R] Raise error when attemping to print table with duplicated naming

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10381:
URL: https://github.com/apache/arrow/pull/10381#discussion_r641398501



##########
File path: r/R/dplyr.R
##########
@@ -31,6 +31,18 @@ arrow_dplyr_query <- function(.data) {
     return(.data)
   }
 
+  # Evaluating expressions on a dataset with duplicated fieldnames will error
+  if (anyDuplicated(names(.data))) {
+    abort(c(
+      "Duplicated field names",
+      x = paste0(
+        "The following field names were found more than once in the data: ",
+        paste(names(.data)[duplicated(names(.data))], collapse = ", ")
+      ),
+      i =  "All field names must be unique"
+    ))
+  }

Review comment:
       Fair comment, it does not, and you're right it's redundant - have removed it now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10381: ARROW-12722: [R] Raise error when attemping to print table with duplicated naming

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10381:
URL: https://github.com/apache/arrow/pull/10381#discussion_r641638494



##########
File path: r/tests/testthat/test-dplyr.R
##########
@@ -593,3 +594,12 @@ test_that("bad explicit type conversions with as.*()", {
   )
 
 })
+
+test_that("No duplicate field names are allowed in an arrow_dplyr_query", {
+  expect_error(
+    Table$create(tbl, tbl) %>%
+      filter(int > 0),
+    regexp = "The following field names were found more than once in the data: int, dbl, dbl2, lgl, false, chr, fct, verses, padded_strings"

Review comment:
       The last change altered the error message slightly so you'll need to fix (I'd make a `suggestion` here but you'll still need to pull and run to confirm)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson closed pull request #10381: ARROW-12722: [R] Raise error when attemping to print table with duplicated naming

Posted by GitBox <gi...@apache.org>.
nealrichardson closed pull request #10381:
URL: https://github.com/apache/arrow/pull/10381


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org